5-Programming - TsFile API.md 26.4 KB
Newer Older
Z
Zihan Meng 已提交
1 2 3 4 5 6 7 8 9
<!--

    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at
10
    
Z
Zihan Meng 已提交
11
        http://www.apache.org/licenses/LICENSE-2.0
12
    
Z
Zihan Meng 已提交
13 14 15 16 17 18 19 20
    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.

-->
21

H
Haonan 已提交
22 23 24
# Programming - TsFile API

TsFile is a file format of Time Series we used in IoTDB. In this section, we would like to introduce the usage of this file format. 
H
Haonan 已提交
25

J
Jialin Qiao 已提交
26
## TsFile library Installation
H
Haonan 已提交
27 28 29 30 31 32 33 34 35 36 37

There are two ways to use TsFile in your own project.

* Using as jars:
	* Compile the source codes and build to jars
	
		```
		git clone https://github.com/apache/incubator-iotdb.git
		cd tsfile/
		mvn clean package -Dmaven.test.skip=true
		```
J
Jialin Qiao 已提交
38
		Then, all the jars can be get in folder named `target/`. Import `target/tsfile-0.10.0-jar-with-dependencies.jar` to your project.
H
Haonan 已提交
39 40 41
	
* Using as a maven dependency: 

42
  Compile source codes and deploy to your local repository in three steps:
H
Haonan 已提交
43

44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
  * Get the source codes

  	```
  	git clone https://github.com/apache/incubator-iotdb.git
  	```
  * Compile the source codes and deploy 
  	
  	```
  	cd tsfile/
  	mvn clean install -Dmaven.test.skip=true
  	```
  * add dependencies into your project:

    ```
  	 <dependency>
  	   <groupId>org.apache.iotdb</groupId>
  	   <artifactId>tsfile</artifactId>
  	   <version>0.10.0</version>
  	 </dependency>
    ```
    

  Or, you can download the dependencies from official Maven repository:

  * First, find your maven `settings.xml` on path: `${username}\.m2\settings.xml`
    , add this `<profile>` to `<profiles>`:
    ```
      <profile>
H
Haonan 已提交
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
           <id>allow-snapshots</id>
              <activation><activeByDefault>true</activeByDefault></activation>
           <repositories>
             <repository>  
                <id>apache.snapshots</id>
                <name>Apache Development Snapshot Repository</name>
                <url>https://repository.apache.org/content/repositories/snapshots/</url>
                <releases>
                    <enabled>false</enabled>
                </releases>
                <snapshots>
                    <enabled>true</enabled>
                </snapshots>
              </repository>
           </repositories>
         </profile>
88 89 90 91 92 93 94 95 96 97
    ```
  * Then add dependencies into your project:

    ```
  	 <dependency>
  	   <groupId>org.apache.iotdb</groupId>
  	   <artifactId>tsfile</artifactId>
  	   <version>0.10.0</version>
  	 </dependency>
    ```
Z
Zihan Meng 已提交
98

99 100
## TSFile Usage
This section demonstrates the detailed usages of TsFile.
Z
Zihan Meng 已提交
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116

### Time-series Data
A time-series is considered as a sequence of quadruples. A quadruple is defined as (device, measurement, time, value).

* **measurement**: A physical or formal measurement that a time-series is taking, e.g., the temperature of a city, the 
sales number of some goods or the speed of a train at different times. As a traditional sensor (like a thermometer) also
 takes a single measurement and produce a time-series, we will use measurement and sensor interchangeably below.

* **device**: A device refers to an entity that is taking several measurements (producing multiple time-series), e.g., 
a running train monitors its speed, oil meter, miles it has run, current passengers each is conveyed to a time-series.

Table 1 illustrates a set of time-series data. The set showed in the following table contains one device named "device\_1" 
with three measurements named "sensor\_1", "sensor\_2" and "sensor\_3". 

<center>
<table style="text-align:center">
H
Haonan 已提交
117 118
    <tr><th colspan="6">device_1</th></tr>
    <tr><th colspan="2">sensor_1</th><th colspan="2">sensor_2</th><th colspan="2">sensor_3</th></tr>
119
    <tr><th>time</th><th>value</th><th>time</th><th>value</th><th>time</th><th>value</th></tr>
H
Haonan 已提交
120 121 122 123
    <tr><td>1</td><td>1.2</td><td>1</td><td>20</td><td>2</td><td>50</td></tr>
    <tr><td>3</td><td>1.4</td><td>2</td><td>20</td><td>4</td><td>51</td></tr>
    <tr><td>5</td><td>1.1</td><td>3</td><td>21</td><td>6</td><td>52</td></tr>
    <tr><td>7</td><td>1.8</td><td>4</td><td>20</td><td>8</td><td>53</td></tr>
Z
Zihan Meng 已提交
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
</table>
<span>A set of time-series data</span>
</center>

**One Line of Data**: In many industrial applications, a device normally contains more than one sensor and these sensors
 may have values at a same timestamp, which is called one line of data. 

Formally, one line of data consists of a `device_id`, a timestamp which indicates the milliseconds since January 1,
 1970, 00:00:00, and several data pairs composed of `measurement_id` and corresponding `value`. All data pairs in one 
 line belong to this `device_id` and have the same timestamp. If one of the `measurements` does not have a `value` 
 in the `timestamp`, use a space instead(Actually, TsFile does not store null values). Its format is shown as follow:

```
device_id, timestamp, <measurement_id, value>...
```

An example is illustrated as follow. In this example, the data type of two measurements are  `INT32`, `FLOAT` respectively.

```
device_1, 1490860659000, m1, 10, m2, 12.12
```


### Writing TsFile

#### Generate a TsFile File.
A TsFile can be generated by following three steps and the complete code will be given in the section "Example for writing TsFile".

* First, construct a `TsFileWriter` instance.
153
  
Z
Zihan Meng 已提交
154 155 156 157 158 159 160
    Here are the available constructors:
    
    * Without pre-defined schema
    ```
    public TsFileWriter(File file) throws IOException
    ```
    * With pre-defined schema
H
Haonan 已提交
161 162 163 164 165 166 167
    ```
    public TsFileWriter(File file, Schema schema) throws IOException
    ```
    This one is for using the HDFS file system. `TsFileOutput` can be an instance of class `HDFSOutput`.
    
    ```
    public TsFileWriter(TsFileOutput output, Schema schema) throws IOException 
Z
Zihan Meng 已提交
168
    ```
169 170 171 172 173 174 175 176 177 178 179
    
    If you want to set some TSFile configuration on your own, you could use param `config`. For example:
    ```
    TSFileConfig conf = new TSFileConfig();
    conf.setTSFileStorageFs("HDFS");
    TsFileWriter tsFileWriter = new TsFileWriter(file, schema, conf);
    ```
    In this example, data files will be stored in HDFS, instead of local file system. If you'd like to store data files in local file system, you can use `conf.setTSFileStorageFs("LOCAL")`, which is also the default config.
    
    You can also config the ip and port of your HDFS by `config.setHdfsIp(...)` and `config.setHdfsPort(...)`. The default ip is `localhost` and default port is `9000`.
    
H
Haonan 已提交
180 181 182 183 184 185 186
    **Parameters:**
    
    * file : The TsFile to write
    
    * schema : The file schemas, will be introduced in next part.
    
    * config : The config of TsFile.
Z
Zihan Meng 已提交
187 188

* Second, add measurements
189
  
H
Haonan 已提交
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209
    Or you can make an instance of class `Schema` first and pass this to the constructor of class `TsFileWriter`
    
    The class `Schema` contains a map whose key is the name of one measurement schema, and the value is the schema itself.
    
    Here are the interfaces:
    ```
    // Create an empty Schema or from an existing map
    public Schema()
    public Schema(Map<String, MeasurementSchema> measurements)
    // Use this two interfaces to add measurements
    public void registerMeasurement(MeasurementSchema descriptor)
    public void registerMeasurements(Map<String, MeasurementSchema> measurements)
    // Some useful getter and checker
    public TSDataType getMeasurementDataType(String measurementId)
    public MeasurementSchema getMeasurementSchema(String measurementId)
    public Map<String, MeasurementSchema> getAllMeasurementSchema()
    public boolean hasMeasurement(String measurementId)
    ```
    
    You can always use the following interface in `TsFileWriter` class to add additional measurements: 
210

Z
Zihan Meng 已提交
211 212 213
    ```
    public void addMeasurement(MeasurementSchema measurementSchema) throws WriteProcessException
    ```
H
Haonan 已提交
214 215 216 217 218 219 220
    
    The class `MeasurementSchema` contains the information of one measurement, there are several constructors:
    ```
    public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding)
    public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType)
    public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType, 
    Map<String, String> props)
Z
Zihan Meng 已提交
221 222 223
    ```
    
    **Parameters:**
224

Z
Zihan Meng 已提交
225
    * measurementID: The name of this measurement, typically the name of the sensor.
226
      
Z
Zihan Meng 已提交
227 228
    * type: The data type, now support six types: `BOOLEAN`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `TEXT`;
    
229
    * encoding: The data encoding. See [Chapter 2-3](/document/master/UserGuide/2-Concept/3-Encoding.html).
Z
Zihan Meng 已提交
230 231 232 233 234
    
    * compression: The data compression. Now supports `UNCOMPRESSED` and `SNAPPY`.
    
    * props: Properties for special data types.Such as `max_point_number` for `FLOAT` and `DOUBLE`, `max_string_length` for
    `TEXT`. Use as string pairs into a map such as ("max_point_number", "3").
H
Haonan 已提交
235
    
Z
Zihan Meng 已提交
236 237 238
    > **Notice:** Although one measurement name can be used in multiple deltaObjects, the properties cannot be changed. I.e. 
        it's not allowed to add one measurement name for multiple times with different type or encoding.
        Here is a bad example:
239
    
H
Haonan 已提交
240 241 242 243 244
        // The measurement "sensor_1" is float type
        addMeasurement(new MeasurementSchema("sensor_1", TSDataType.FLOAT, TSEncoding.RLE));
        
        // This call will throw a WriteProcessException exception
        addMeasurement(new MeasurementSchema("sensor_1", TSDataType.INT32, TSEncoding.RLE));
Z
Zihan Meng 已提交
245
* Third, insert and write data continually.
246
  
H
Haonan 已提交
247 248 249 250 251 252 253 254 255 256 257 258 259 260
    Use this interface to create a new `TSRecord`(a timestamp and device pair).
    
    ```
    public TSRecord(long timestamp, String deviceId)
    ```
    Then create a `DataPoint`(a measurement and value pair), and use the addTuple method to add the DataPoint to the correct
    TsRecord.
    
    Use this method to write
    
    ```
    public void write(TSRecord record) throws IOException, WriteProcessException
    ```
    
Z
Zihan Meng 已提交
261
* Finally, call `close` to finish this writing process. 
262
  
H
Haonan 已提交
263 264 265
    ```
    public void close() throws IOException
    ```
Z
Zihan Meng 已提交
266 267 268 269 270

#### Example for writing a TsFile

You should install TsFile to your local maven repository.

271 272 273
```
mvn clean install -pl tsfile -am -DskipTests
```
Z
Zihan Meng 已提交
274

J
jack870131 已提交
275 276
You could write a TsFile by constructing **TSRecord** if you have the **non-aligned** (e.g. not all sensors contain values) time series data.

277
A more thorough example can be found at `/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithTSRecord.java`
Z
Zihan Meng 已提交
278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295

```java
package org.apache.iotdb.tsfile;

import java.io.File;
import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType;
import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding;
import org.apache.iotdb.tsfile.write.TsFileWriter;
import org.apache.iotdb.tsfile.write.record.TSRecord;
import org.apache.iotdb.tsfile.write.record.datapoint.DataPoint;
import org.apache.iotdb.tsfile.write.record.datapoint.FloatDataPoint;
import org.apache.iotdb.tsfile.write.record.datapoint.IntDataPoint;
import org.apache.iotdb.tsfile.write.schema.MeasurementSchema;
/**
 * An example of writing data to TsFile
 * It uses the interface:
 * public void addMeasurement(MeasurementSchema MeasurementSchema) throws WriteProcessException
 */
J
jack870131 已提交
296
public class TsFileWriteWithTSRecord {
Z
Zihan Meng 已提交
297 298 299 300 301 302 303 304 305 306 307 308

  public static void main(String args[]) {
    try {
      String path = "test.tsfile";
      File f = new File(path);
      if (f.exists()) {
        f.delete();
      }
      TsFileWriter tsFileWriter = new TsFileWriter(f);

      // add measurements into file schema
      tsFileWriter
Q
qiaojialin 已提交
309
          .addMeasurement(new MeasurementSchema("sensor_1", TSDataType.INT64, TSEncoding.RLE));
Z
Zihan Meng 已提交
310
      tsFileWriter
Q
qiaojialin 已提交
311
          .addMeasurement(new MeasurementSchema("sensor_2", TSDataType.INT64, TSEncoding.RLE));
Z
Zihan Meng 已提交
312
      tsFileWriter
Q
qiaojialin 已提交
313 314
          .addMeasurement(new MeasurementSchema("sensor_3", TSDataType.INT64, TSEncoding.RLE));
            
Z
Zihan Meng 已提交
315 316
      // construct TSRecord
      TSRecord tsRecord = new TSRecord(1, "device_1");
Q
qiaojialin 已提交
317 318 319
      DataPoint dPoint1 = new LongDataPoint("sensor_1", 1);
      DataPoint dPoint2 = new LongDataPoint("sensor_2", 2);
      DataPoint dPoint3 = new LongDataPoint("sensor_3", 3);
Z
Zihan Meng 已提交
320 321
      tsRecord.addTuple(dPoint1);
      tsRecord.addTuple(dPoint2);
Q
qiaojialin 已提交
322 323 324
      tsRecord.addTuple(dPoint3);
            
      // write TSRecord
Z
Zihan Meng 已提交
325
      tsFileWriter.write(tsRecord);
Q
qiaojialin 已提交
326
      
Z
Zihan Meng 已提交
327 328 329 330 331 332 333 334 335 336 337
      // close TsFile
      tsFileWriter.close();
    } catch (Throwable e) {
      e.printStackTrace();
      System.out.println(e.getMessage());
    }
  }
}

```

J
jack870131 已提交
338 339
You could write a TsFile by constructing **RowBatch** if you have the **aligned** time series data.

340
A more thorough example can be found at `/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithRowBatch.java`
J
jack870131 已提交
341 342 343 344 345 346 347 348

```java
package org.apache.iotdb.tsfile;

import java.io.File;
import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType;
import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding;
import org.apache.iotdb.tsfile.write.TsFileWriter;
349
import org.apache.iotdb.tsfile.write.schema.Schema;
J
jack870131 已提交
350 351 352 353 354 355 356 357 358 359 360 361 362 363 364
import org.apache.iotdb.tsfile.write.schema.MeasurementSchema;
import org.apache.iotdb.tsfile.write.record.RowBatch;
/**
 * An example of writing data with RowBatch to TsFile
 */
public class TsFileWriteWithRowBatch {

  public static void main(String[] args) {
    try {
      String path = "test.tsfile";
      File f = new File(path);
      if (f.exists()) {
        f.delete();
      }

Q
qiaojialin 已提交
365
      Schema schema = new Schema();
J
jack870131 已提交
366 367 368 369 370 371 372 373

      // the number of rows to include in the row batch
      int rowNum = 1000000;
      // the number of values to include in the row batch
      int sensorNum = 10;

      // add measurements into file schema (all with INT64 data type)
      for (int i = 0; i < sensorNum; i++) {
374
        schema.registerMeasurement(
J
jack870131 已提交
375 376 377 378
                new MeasurementSchema("sensor_" + (i + 1), TSDataType.INT64, TSEncoding.TS_2DIFF));
      }

      // add measurements into TSFileWriter
379
      TsFileWriter tsFileWriter = new TsFileWriter(f, schema);
J
jack870131 已提交
380 381

      // construct the row batch
382
      RowBatch rowBatch = schema.createRowBatch("device_1");
J
jack870131 已提交
383 384

      long[] timestamps = rowBatch.timestamps;
J
jack870131 已提交
385
      Object[] values = rowBatch.values;
J
jack870131 已提交
386 387 388 389 390 391 392 393

      long timestamp = 1;
      long value = 1000000L;

      for (int r = 0; r < rowNum; r++, value++) {
        int row = rowBatch.batchSize++;
        timestamps[row] = timestamp++;
        for (int i = 0; i < sensorNum; i++) {
J
jack870131 已提交
394
          long[] sensor = (long[]) values[i];
J
jack870131 已提交
395 396 397
          sensor[row] = value;
        }
        // write RowBatch to TsFile
J
jack870131 已提交
398
        if (rowBatch.batchSize == rowBatch.getMaxBatchSize()) {
J
jack870131 已提交
399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419
          tsFileWriter.write(rowBatch);
          rowBatch.reset();
        }
      }
      // write RowBatch to TsFile
      if (rowBatch.batchSize != 0) {
        tsFileWriter.write(rowBatch);
        rowBatch.reset();
      }

      // close TsFile
      tsFileWriter.close();
    } catch (Throwable e) {
      e.printStackTrace();
      System.out.println(e.getMessage());
    }
  }
}

```

Z
Zihan Meng 已提交
420 421 422 423 424 425 426 427
### Interface for Reading TsFile

#### Before the Start

The set of time-series data in section "Time-series Data" is used here for a concrete introduction in this section. The set showed in the following table contains one deltaObject named "device\_1" with three measurements named "sensor\_1", "sensor\_2" and "sensor\_3". And the measurements has been simplified to do a simple illustration, which contains only 4 time-value pairs each.

<center>
<table style="text-align:center">
H
Haonan 已提交
428 429
    <tr><th colspan="6">device_1</th></tr>
    <tr><th colspan="2">sensor_1</th><th colspan="2">sensor_2</th><th colspan="2">sensor_3</th></tr>
430
    <tr><th>time</th><th>value</th><th>time</th><th>value</th><th>time</th><th>value</th></tr>
H
Haonan 已提交
431 432 433 434
    <tr><td>1</td><td>1.2</td><td>1</td><td>20</td><td>2</td><td>50</td></tr>
    <tr><td>3</td><td>1.4</td><td>2</td><td>20</td><td>4</td><td>51</td></tr>
    <tr><td>5</td><td>1.1</td><td>3</td><td>21</td><td>6</td><td>52</td></tr>
    <tr><td>7</td><td>1.8</td><td>4</td><td>20</td><td>8</td><td>53</td></tr>
Z
Zihan Meng 已提交
435 436 437 438
</table>
<span>A set of time-series data</span>
</center>

439

Z
Zihan Meng 已提交
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470
#### Definition of Path

A path is a dot-separated string which uniquely identifies a time-series in TsFile, e.g., "root.area_1.device_1.sensor_1". 
The last section "sensor_1" is called "measurementId" while the remaining parts "root.area_1.device_1" is called deviceId. 
As mentioned above, the same measurement in different devices has the same data type and encoding, and devices are also unique.

In read interfaces, The parameter ```paths``` indicates the measurements to be selected.

Path instance can be easily constructed through the class ```Path```. For example:

```
Path p = new Path("device_1.sensor_1");
```

We will pass an ArrayList of paths for final query call to support multiple paths.

```
List<Path> paths = new ArrayList<Path>();
paths.add(new Path("device_1.sensor_1"));
paths.add(new Path("device_1.sensor_3"));
```

> **Notice:** When constructing a Path, the format of the parameter should be a dot-separated string, the last part will
 be recognized as measurementId while the remaining parts will be recognized as deviceId.


#### Definition of Filter

##### Usage Scenario
Filter is used in TsFile reading process to select data satisfying one or more given condition(s). 

471
#### IExpression
Z
Zihan Meng 已提交
472 473 474 475
The `IExpression` is a filter expression interface and it will be passed to our final query call.
We create one or more filter expressions and may use binary filter operators to link them to our final expression.

* **Create a Filter Expression**
476
  
H
Haonan 已提交
477 478 479 480 481 482 483 484
    There are two types of filters.
    
     * TimeFilter: A filter for `time` in time-series data.
        ```
        IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter);
        ```
        Use the following relationships to get a `TimeFilter` object (value is a long int variable).
        <center>
Z
Zihan Meng 已提交
485
        <table style="text-align:center">
486 487 488 489 490 491 492 493
            <tr><th>Relationship</th><th>Description</th></tr>
            <tr><td>TimeFilter.eq(value)</td><td>Choose the time equal to the value</td></tr>
            <tr><td>TimeFilter.lt(value)</td><td>Choose the time less than the value</td></tr>
            <tr><td>TimeFilter.gt(value)</td><td>Choose the time greater than the value</td></tr>
            <tr><td>TimeFilter.ltEq(value)</td><td>Choose the time less than or equal to the value</td></tr>
            <tr><td>TimeFilter.gtEq(value)</td><td>Choose the time greater than or equal to the value</td></tr>
            <tr><td>TimeFilter.notEq(value)</td><td>Choose the time not equal to the value</td></tr>
            <tr><td>TimeFilter.not(TimeFilter)</td><td>Choose the time not satisfy another TimeFilter</td></tr>
Z
Zihan Meng 已提交
494 495
        </table>
        </center>
H
Haonan 已提交
496 497
        
     * ValueFilter: A filter for `value` in time-series data.
498
       
H
Haonan 已提交
499 500 501 502 503
        ```
        IExpression valueFilterExpr = new SingleSeriesExpression(Path, ValueFilter);
        ```
        The usage of  `ValueFilter` is the same as using `TimeFilter`, just to make sure that the type of the value
        equal to the measurement's(defined in the path).
Z
Zihan Meng 已提交
504 505 506

* **Binary Filter Operators**

H
Haonan 已提交
507
    Binary filter operators can be used to link two single expressions.
Z
Zihan Meng 已提交
508

H
Haonan 已提交
509 510
     * BinaryExpression.and(Expression, Expression): Choose the value satisfy for both expressions.
     * BinaryExpression.or(Expression, Expression): Choose the value satisfy for at least one expression.
511
    
Z
Zihan Meng 已提交
512 513 514 515 516

##### Filter Expression Examples

* **TimeFilterExpression Examples**

H
Haonan 已提交
517 518
    ```
    IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.eq(15)); // series time = 15
Z
Zihan Meng 已提交
519

H
Haonan 已提交
520 521 522
    ```
    ```
    IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.ltEq(15)); // series time <= 15
Z
Zihan Meng 已提交
523

H
Haonan 已提交
524 525 526
    ```
    ```
    IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.lt(15)); // series time < 15
Z
Zihan Meng 已提交
527

H
Haonan 已提交
528 529 530
    ```
    ```
    IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.gtEq(15)); // series time >= 15
Z
Zihan Meng 已提交
531

H
Haonan 已提交
532 533 534
    ```
    ```
    IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.notEq(15)); // series time != 15
Z
Zihan Meng 已提交
535

H
Haonan 已提交
536 537 538
    ```
    ```
    IExpression timeFilterExpr = BinaryExpression.and(new GlobalTimeExpression(TimeFilter.gtEq(15L)),
Z
Zihan Meng 已提交
539
                                             new GlobalTimeExpression(TimeFilter.lt(25L))); // 15 <= series time < 25
H
Haonan 已提交
540 541 542
    ```
    ```
    IExpression timeFilterExpr = BinaryExpression.or(new GlobalTimeExpression(TimeFilter.gtEq(15L)),
Z
Zihan Meng 已提交
543
                                             new GlobalTimeExpression(TimeFilter.lt(25L))); // series time >= 15 or series time < 25
H
Haonan 已提交
544
    ```
Z
Zihan Meng 已提交
545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562
#### Read Interface

First, we open the TsFile and get a `ReadOnlyTsFile` instance from a file path string `path`.

```
TsFileSequenceReader reader = new TsFileSequenceReader(path);
   
ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader);
```
Next, we prepare the path array and query expression, then get final `QueryExpression` object by this interface:

```
QueryExpression queryExpression = QueryExpression.create(paths, statement);
```

The ReadOnlyTsFile class has two `query` method to perform a query.
* **Method 1**

H
Haonan 已提交
563 564 565
    ```
    public QueryDataSet query(QueryExpression queryExpression) throws IOException
    ```
Z
Zihan Meng 已提交
566 567 568

* **Method 2**

H
Haonan 已提交
569 570 571
    ```
    public QueryDataSet query(QueryExpression queryExpression, long partitionStartOffset, long partitionEndOffset) throws IOException
    ```
Z
Zihan Meng 已提交
572

H
Haonan 已提交
573
    This method is designed for advanced applications such as the TsFile-Spark Connector.
Z
Zihan Meng 已提交
574

H
Haonan 已提交
575 576 577
    * **params** : For method 2, two additional parameters are added to support partial query:
        *  ```partitionStartOffset```: start offset for a TsFile
        *  ```partitionEndOffset```: end offset for a TsFile
Z
Zihan Meng 已提交
578

H
Haonan 已提交
579 580 581
        > **What is Partial Query ?**
        >
        > In some distributed file systems(e.g. HDFS), a file is split into severval parts which are called "Blocks" and stored in different nodes. Executing a query paralleled in each nodes involved makes better efficiency. Thus Partial Query is needed. Paritial Query only selects the results stored in the part split by ```QueryConstant.PARTITION_START_OFFSET``` and ```QueryConstant.PARTITION_END_OFFSET``` for a TsFile.
Z
Zihan Meng 已提交
582

583
### QueryDataset Interface
Z
Zihan Meng 已提交
584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606

The query performed above will return a `QueryDataset` object.

Here's the useful interfaces for user.


* `bool hasNext();`

    Return true if this dataset still has elements.
* `List<Path> getPaths()`

    Get the paths in this data set.
* `List<TSDataType> getDataTypes();` 

   Get the data types. The class TSDataType is an enum class, the value will be one of the following:
   
       BOOLEAN,
       INT32,
       INT64,
       FLOAT,
       DOUBLE,
       TEXT;
 * `RowRecord next() throws IOException;`
607

Z
Zihan Meng 已提交
608 609 610 611
    Get the next record.
    
    The class `RowRecord` consists of a `long` timestamp and a `List<Field>` for data in different sensors,
     we can use two getter methods to get them.
612
    
Z
Zihan Meng 已提交
613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689
    ```
    long getTimestamp();
    List<Field> getFields();
    ```
    
    To get data from one Field, use these methods:
    
    ```
    TSDataType getDataType();
    Object getObjectValue();
    ```

#### Example for reading an existing TsFile


You should install TsFile to your local maven repository.


A more thorough example with query statement can be found at 
`/tsfile/example/src/main/java/org/apache/iotdb/tsfile/TsFileRead.java`

```java
package org.apache.iotdb.tsfile;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.iotdb.tsfile.read.ReadOnlyTsFile;
import org.apache.iotdb.tsfile.read.TsFileSequenceReader;
import org.apache.iotdb.tsfile.read.common.Path;
import org.apache.iotdb.tsfile.read.expression.IExpression;
import org.apache.iotdb.tsfile.read.expression.QueryExpression;
import org.apache.iotdb.tsfile.read.expression.impl.BinaryExpression;
import org.apache.iotdb.tsfile.read.expression.impl.GlobalTimeExpression;
import org.apache.iotdb.tsfile.read.expression.impl.SingleSeriesExpression;
import org.apache.iotdb.tsfile.read.filter.TimeFilter;
import org.apache.iotdb.tsfile.read.filter.ValueFilter;
import org.apache.iotdb.tsfile.read.query.dataset.QueryDataSet;

/**
 * The class is to show how to read TsFile file named "test.tsfile".
 * The TsFile file "test.tsfile" is generated from class TsFileWrite.
 * Run TsFileWrite to generate the test.tsfile first
 */
public class TsFileRead {
  private static void queryAndPrint(ArrayList<Path> paths, ReadOnlyTsFile readTsFile, IExpression statement)
          throws IOException {
    QueryExpression queryExpression = QueryExpression.create(paths, statement);
    QueryDataSet queryDataSet = readTsFile.query(queryExpression);
    while (queryDataSet.hasNext()) {
      System.out.println(queryDataSet.next());
    }
    System.out.println("------------");
  }

  public static void main(String[] args) throws IOException {

    // file path
    String path = "test.tsfile";

    // create reader and get the readTsFile interface
    TsFileSequenceReader reader = new TsFileSequenceReader(path);
    ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader);
    // use these paths(all sensors) for all the queries
    ArrayList<Path> paths = new ArrayList<>();
    paths.add(new Path("device_1.sensor_1"));
    paths.add(new Path("device_1.sensor_2"));
    paths.add(new Path("device_1.sensor_3"));

    // no query statement
    queryAndPrint(paths, readTsFile, null);

    //close the reader when you left
    reader.close();
  }
}

```

690
## User-specified config file path
Z
Zihan Meng 已提交
691 692 693 694 695 696 697 698 699 700

Default config file `tsfile-format.properties.template` is located at `/tsfile/src/main/resources` directory. If you want to use your own path, you can:
```
System.setProperty(TsFileConstant.TSFILE_CONF, "your config file path");
```
and then call:
```
TSFileConfig config = TSFileDescriptor.getInstance().getConfig();
```

S
SilverNarcissus 已提交
701 702 703 704 705 706 707 708 709 710 711
## Bloom filter

Bloom filter checks whether a given time series is in the tsfile before loading metadata. This can improve the performance of loading metadata and skip the tsfile that doesn't contain specified time series.
If you want to learn more about its mechanism, you can refer to: [wiki page of bloom filter](https://en.wikipedia.org/wiki/Bloom_filter).

#### configuration 
you can control the false positive rate of bloom filter by the following parameter in the config file `tsfile-format.properties` which located at `/server/src/assembly/resources/conf` directory
```
# The acceptable error rate of bloom filter, should be in [0.01, 0.1], default is 0.05
bloom_filter_error_rate=0.05
```
Z
Zihan Meng 已提交
712