hadoop_java_sdk.md 23.5 KB
Newer Older
1
# Use JuiceFS on Hadoop Ecosystem
2

3
## Table of Content
4 5 6 7

- [Requirements](#requirements)
  * [1. Hadoop and related components](#1-hadoop-and-related-components)
  * [2. User permissions](#2-user-permissions)
8
  * [3. File system](#3-file-system)
9 10 11 12 13 14 15
  * [4. Memory](#4-memory)
- [Client compilation](#client-compilation)
  * [Linux and macOS](#linux-and-macos)
  * [Windows](#windows)
- [Deploy the client](#deploy-the-client)
  * [Big data Platforms](#big-data-platforms)
  * [Community Components](#community-components)
16
  * [Client Configurations](#client-configurations)
17 18 19 20 21
    + [Core Configurations](#core-configurations)
    + [Cache Configurations](#cache-configurations)
    + [I/O Configurations](#io-configurations)
    + [Other Configurations](#other-configurations)
    + [Multiple file systems configuration](#multiple-file-systems-configuration)
22
    + [Configuration Example](#configurationexample)
23 24 25 26
- [Configuration in Hadoop](#configuration-in-hadoop)
  * [CDH6](#cdh6)
  * [HDP](#hdp)
  * [Flink](#flink)
27
  * [Hudi](#hudi)
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
  * [Restart Services](#restart-services)
- [Environmental Verification](#environmental-verification)
  * [Hadoop](#hadoop)
  * [Hive](#hive)
- [Monitoring metrics collection](#monitoring-metrics-collection)
- [Benchmark](#benchmark)
  * [1. Local Benchmark](#1-local-benchmark)
    + [Metadata](#metadata)
    + [I/O Performance](#io-performance)
  * [2. Distributed Benchmark](#2-distributed-benchmark)
    + [Metadata](#metadata-1)
    + [I/O Performance](#io-performance-1)
- [FAQ](#faq)

----

JuiceFS provides [Hadoop-compatible FileSystem](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html) by Hadoop Java SDK. Various applications in the Hadoop ecosystem can smoothly use JuiceFS to store data without changing the code.

## Requirements

### 1. Hadoop and related components
49 50 51

JuiceFS Hadoop Java SDK is compatible with Hadoop 2.x and Hadoop 3.x. As well as variety of components in Hadoop ecosystem.

52
### 2. User permissions
53

54
JuiceFS use local mapping of `user` and `UID`. So, you should [sync all the needed users and their UIDs](sync_accounts_between_multiple_hosts.md) across the whole Hadoop cluster to avoid permission error. You can also specify a global user list and user group file, please refer to the [relevant configurations](#other-configurations).
55

56
### 3. File system
57

58 59
You should first create at least one JuiceFS file system to provide storage for components related to the Hadoop ecosystem through the JuiceFS Java SDK. When deploying the Java SDK, specify the metadata engine address of the created file system in the configuration file.

60
To create a file system, please refer to [JuiceFS Quick Start Guide](quick_start_guide.md).
61

62
> **Note**: If you want to use JuiceFS in a distributed environment, when creating a file system, please plan the object storage and database to be used reasonably to ensure that they can be accessed by each node in the cluster.
63 64 65

### 4. Memory

66
JuiceFS Hadoop Java SDK need extra 4 * [`juicefs.memory-size`](#io-configurations) off-heap memory at most. By default, up to 1.2 GB of additional memory is required (depends on write load).
67 68 69 70 71 72 73 74 75 76 77

## Client compilation

Compilation depends on the following tools:

- [Go](https://golang.org/) 1.15+
- JDK 8+
- [Maven](https://maven.apache.org/) 3.3+
- git
- make
- GCC 5.4+
78

C
Changjian Gao 已提交
79
> **Note**: If Ceph RADOS is used to store data, you need to install librados-dev and build `libjfs.so` with `-tag ceph`.
80

81
### Linux and macOS
82

83
Clone the repository:
84

85 86 87
```shell
$ git clone https://github.com/juicedata/juicefs.git
```
88

89 90 91 92 93 94 95 96 97 98 99 100 101
Enter the directory and compile:

```shell
$ cd juicefs/sdk/java
$ make
```

After the compilation, you can find the compiled `JAR` file in the `sdk/java/target` directory, including two versions:

- Contains third-party dependent packages: `juicefs-hadoop-X.Y.Z.jar`
- Does not include third-party dependent packages: `original-juicefs-hadoop-X.Y.Z.jar`

It is recommended to use a version that includes third-party dependencies.
102 103 104

### Windows

105
The client used in the Windows environment needs to be obtained through cross-compilation on Linux or macOS. The compilation depends on [mingw-w64](https://www.mingw-w64.org/), which needs to be installed first.
106

107
The steps are the same as compiling on Linux or macOS. For example, on the Ubuntu system, install the `mingw-w64` package first to solve the dependency problem:
108

109 110 111 112 113
```shell
$ sudo apt install mingw-w64
```

Clone and enter the JuiceFS source code directory, execute the following code to compile:
114

115 116 117 118
```shell
$ cd juicefs/sdk/java
$ make win
```
119

120
> **Note**: No matter which system environment the client is compiled for, the compiled JAR file has the same name and can only be deployed in the matching system environment. For example, when compiled in Linux, it can only be used in the Linux environment. In addition, since the compiled package depends on glibc, it is recommended to compile with a lower version system to ensure better compatibility.
121

122
## Deploy the client
123

124
To enable each component of the Hadoop ecosystem to correctly identify JuiceFS, the following configurations are required:
125

126
1. Place the compiled JAR file and `$JAVA_HOME/lib/tools.jar` into the `classpath` of the component. The installation paths of common big data platforms and components are shown in the table below.
127
2. Put JuiceFS configurations into the configuration file of each Hadoop ecosystem component (usually `core-site.xml`), see [Client Configurations](#client-configurations) for details.
128

129 130
It is recommended to place the JAR file in a fixed location, and the other locations are called it through symbolic links.

131
### Big Data Platforms
132 133 134 135 136 137

| Name              | Installing Paths                                             |
| ----------------- | ------------------------------------------------------------ |
| CDH               | `/opt/cloudera/parcels/CDH/lib/hadoop/lib`<br>`/opt/cloudera/parcels/CDH/spark/jars`<br>`/var/lib/impala` |
| HDP               | `/usr/hdp/current/hadoop-client/lib`<br>`/usr/hdp/current/hive-client/auxlib`<br>`/usr/hdp/current/spark2-client/jars` |
| Amazon EMR        | `/usr/lib/hadoop/lib`<br>`/usr/lib/spark/jars`<br>`/usr/lib/hive/auxlib` |
138
| Alibaba Cloud EMR | `/opt/apps/ecm/service/hadoop/*/package/hadoop*/share/hadoop/common/lib`<br>`/opt/apps/ecm/service/spark/*/package/spark*/jars`<br>`/opt/apps/ecm/service/presto/*/package/presto*/plugin/hive-hadoop2`<br>`/opt/apps/ecm/service/hive/*/package/apache-hive*/lib`<br>`/opt/apps/ecm/service/impala/*/package/impala*/lib` |
139 140 141
| Tencent Cloud EMR | `/usr/local/service/hadoop/share/hadoop/common/lib`<br>`/usr/local/service/presto/plugin/hive-hadoop2`<br>`/usr/local/service/spark/jars`<br>`/usr/local/service/hive/auxlib` |
| UCloud UHadoop    | `/home/hadoop/share/hadoop/common/lib`<br>`/home/hadoop/hive/auxlib`<br>`/home/hadoop/spark/jars`<br>`/home/hadoop/presto/plugin/hive-hadoop2` |
| Baidu Cloud EMR   | `/opt/bmr/hadoop/share/hadoop/common/lib`<br>`/opt/bmr/hive/auxlib`<br>`/opt/bmr/spark2/jars` |
142 143 144 145

### Community Components

| Name   | Installing Paths                     |
146
| ------ | ------------------------------------ |
147 148 149 150
| Spark  | `${SPARK_HOME}/jars`                 |
| Presto | `${PRESTO_HOME}/plugin/hive-hadoop2` |
| Flink  | `${FLINK_HOME}/lib`                  |

151
### Client Configurations
152 153

Please refer to the following table to set the relevant parameters of the JuiceFS file system and write it into the configuration file, which is generally `core-site.xml`.
154

155
#### Core Configurations
156

C
Changjian Gao 已提交
157 158 159 160 161
| Configuration                    | Default Value                | Description                                                                                                                                                                                                                                                                                  |
| -------------------------------- | ---------------------------- | ------------------------------------------------------------                                                                                                                                                                                                                                 |
| `fs.jfs.impl`                    | `io.juicefs.JuiceFileSystem` | Specify the storage implementation to be used. By default, `jfs://` scheme is used. If you want to use different scheme (e.g. `cfs://`), just modify it to `fs.cfs.impl`. No matter what sheme you use, it is always access the data in JuiceFS.                                             |
| `fs.AbstractFileSystem.jfs.impl` | `io.juicefs.JuiceFS`         | Specify the storage implementation to be used. By default, `jfs://` scheme is used. If you want to use different scheme (e.g. `cfs://`), just modify it to `fs.AbstractFileSystem.cfs.impl`. No matter what sheme you use, it is always access the data in JuiceFS.                          |
| `juicefs.meta`                   |                              | Specify the metadata engine address of the pre-created JuiceFS file system. You can configure multiple file systems for the client at the same time through the format of `juicefs.{vol_name}.meta`. Refer to ["Multiple file systems configuration"](#multiple-file-systems-configuration). |
162

163
#### Cache Configurations
164

165 166 167 168 169 170
| Configuration                | Default Value | Description                                                  |
| ---------------------------- | ------------- | ------------------------------------------------------------ |
| `juicefs.cache-dir`          |               | Directory paths of local cache. Use colon to separate multiple paths. Also support wildcard in path. **It's recommended create these directories manually and set `0777` permission so that different applications could share the cache data.** |
| `juicefs.cache-size`         | 0             | Maximum size of local cache in MiB. It's the total size when set multiple cache directories. |
| `juicefs.cache-full-block`   | `true`        | Whether cache every read blocks, `false` means only cache random/small read blocks. |
| `juicefs.free-space`         | 0.1           | Min free space ratio of cache directory                      |
171 172 173
| `juicefs.attr-cache`         | 0             | Expire of attributes cache in seconds                        |
| `juicefs.entry-cache`        | 0             | Expire of file entry cache in seconds                        |
| `juicefs.dir-entry-cache`    | 0             | Expire of directory entry cache in seconds                   |
174 175
| `juicefs.discover-nodes-url` |               | The URL to discover cluster nodes, refresh every 10 minutes.<br /><br />YARN: `yarn`<br />Spark Standalone: `http://spark-master:web-ui-port/json/`<br />Spark ThriftServer: `http://thrift-server:4040/api/v1/applications/`<br />Presto: `http://coordinator:discovery-uri-port/v1/service/presto/` |

176
#### I/O Configurations
177

178
| Configuration            | Default Value | Description                                     |
179
| ------------------------ | ------------- | ----------------------------------------------- |
180 181 182 183 184 185 186
| `juicefs.max-uploads`    | 20            | The max number of connections to upload         |
| `juicefs.get-timeout`    | 5             | The max number of seconds to download an object |
| `juicefs.put-timeout`    | 60            | The max number of seconds to upload an object   |
| `juicefs.memory-size`    | 300           | Total read/write buffering in MiB               |
| `juicefs.prefetch`       | 1             | Prefetch N blocks in parallel                   |
| `juicefs.upload-limit`   | 0             | Bandwidth limit for upload in Mbps              |
| `juicefs.download-limit` | 0             | Bandwidth limit for download in Mbps            |
187

188
#### Other Configurations
189

190 191 192 193 194 195
| Configuration             | Default Value | Description                                                  |
| ------------------------- | ------------- | ------------------------------------------------------------ |
| `juicefs.debug`           | `false`       | Whether enable debug log                                     |
| `juicefs.access-log`      |               | Access log path. Ensure Hadoop application has write permission, e.g. `/tmp/juicefs.access.log`. The log file will rotate  automatically to keep at most 7 files. |
| `juicefs.superuser`       | `hdfs`        | The super user                                               |
| `juicefs.users`           | `null`        | The path of username and UID list file, e.g. `jfs://name/etc/users`. The file format is `<username>:<UID>`, one user per line. |
T
tangyoupeng 已提交
196
| `juicefs.groups`          | `null`        | The path of group name, GID and group members list file, e.g. `jfs://name/etc/groups`. The file format is `<group-name>:<GID>:<username1>,<username2>`, one group per line. |
197 198 199 200 201 202
| `juicefs.umask`           | `null`        | The umask used when creating files and directories (e.g. `0022`), default value is `fs.permissions.umask-mode`. |
| `juicefs.push-gateway`    |               | [Prometheus Pushgateway](https://github.com/prometheus/pushgateway) address, format is `<host>:<port>`. |
| `juicefs.push-interval`   | 10            | Prometheus push interval in seconds                          |
| `juicefs.push-auth`       |               | [Prometheus basic auth](https://prometheus.io/docs/guides/basic-auth) information, format is `<username>:<password>`. |
| `juicefs.fast-resolve`    | `true`        | Whether enable faster metadata lookup using Redis Lua script |
| `juicefs.no-usage-report` | `false`       | Whether disable usage reporting. JuiceFS only collects anonymous usage data (e.g. version number), no user or any sensitive data will be collected. |
203

204 205 206
#### Multiple file systems configuration

When multiple JuiceFS file systems need to be used at the same time, all the above configuration items can be specified for a specific file system. You only need to put the file system name in the middle of the configuration item, such as `jfs1` and `jfs2` in the following example:
207 208 209

```xml
<property>
210 211 212 213 214 215
  <name>juicefs.jfs1.meta</name>
  <value>redis://jfs1.host:port/1</value>
</property>
<property>
  <name>juicefs.jfs2.meta</name>
  <value>redis://jfs2.host:port/1</value>
216 217 218
</property>
```

219
#### Configuration Example
220

221
The following is a commonly used configuration example. Please replace the `{HOST}`, `{PORT}` and `{DB}` variables in the `juicefs.meta` configuration with actual values.
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249

```xml
<property>
  <name>fs.jfs.impl</name>
  <value>io.juicefs.JuiceFileSystem</value>
</property>
<property>
  <name>fs.AbstractFileSystem.jfs.impl</name>
  <value>io.juicefs.JuiceFS</value>
</property>
<property>
  <name>juicefs.meta</name>
  <value>redis://{HOST}:{PORT}/{DB}</value>
</property>
<property>
  <name>juicefs.cache-dir</name>
  <value>/data*/jfs</value>
</property>
<property>
  <name>juicefs.cache-size</name>
  <value>1024</value>
</property>
<property>
  <name>juicefs.access-log</name>
  <value>/tmp/juicefs.access.log</value>
</property>
```

250
## Configuration in Hadoop
251

252
Please refer to the aforementioned configuration tables and add configuration parameters to the Hadoop configuration file `core-site.xml`.
253

254
### CDH6
255

256
If you are using CDH 6, in addition to modifying `core-site`, you also need to modify `mapreduce.application.classpath` through the YARN service interface, adding:
257 258 259 260 261

```shell
$HADOOP_COMMON_HOME/lib/juicefs-hadoop.jar
```

262
### HDP
263

264
In addition to modifying `core-site`, you also need to modify the configuration `mapreduce.application.classpath` through the MapReduce2 service interface and add it at the end (variables do not need to be replaced):
265 266 267 268 269

```shell
/usr/hdp/${hdp.version}/hadoop/lib/juicefs-hadoop.jar
```

270
### Flink
271

272
Add configuration parameters to `conf/flink-conf.yaml`. If you only use JuiceFS in Flink, you don't need to configure JuiceFS in the Hadoop environment, you only need to configure the Flink client.
273

274 275 276 277 278 279
### Hudi

> **Note**: The latest version of Hudi (v0.9.0) does not yet support JuiceFS, you need to compile the latest master branch yourself.

Please refer to ["Hudi Official Documentation"](https://hudi.apache.org/docs/next/jfs_hoodie) to learn how to configure JuiceFS.

280
### Restart Services
281

282
When the following components need to access JuiceFS, they should be restarted.
283

284
> **Note**: Before restart, you need to confirm JuiceFS related configuration has been written to the configuration file of each component, usually you can find them in `core-site.xml` on the machine where the service of the component was deployed.
285

286
| Components | Services                   |
287
| ---------- | -------------------------- |
288 289 290 291 292
| Hive       | HiveServer<br />Metastore  |
| Spark      | ThriftServer               |
| Presto     | Coordinator<br />Worker    |
| Impala     | Catalog Server<br />Daemon |
| HBase      | Master<br />RegionServer   |
293

294
HDFS, Hue, ZooKeeper and other services don't need to be restarted.
295

296
When `Class io.juicefs.JuiceFileSystem not found` or `No FilesSystem for scheme: jfs` exceptions was occurred after restart, reference [FAQ](#faq).
297

298 299 300
## Environmental Verification

After the deployment of the JuiceFS Java SDK, the following methods can be used to verify the success of the deployment.
301 302 303 304 305 306 307

### Hadoop

```bash
$ hadoop fs -ls jfs://{JFS_NAME}/
```

308 309
> **Note**: The `JFS_NAME` is the volume name when you format JuiceFS file system.

310 311 312 313 314 315 316 317 318
### Hive

```sql
CREATE TABLE IF NOT EXISTS person
(
  name STRING,
  age INT
) LOCATION 'jfs://{JFS_NAME}/tmp/person';
```
319

320
## Monitoring metrics collection
321

322
JuiceFS Hadoop Java SDK supports reporting metrics to [Prometheus Pushgateway](https://github.com/prometheus/pushgateway), then you can use [Grafana](https://grafana.com) and [dashboard template](grafana_template.json) to visualize these metrics.
323 324 325 326 327 328 329 330 331 332

Enable metrics reporting through following configurations:

```xml
<property>
  <name>juicefs.push-gateway</name>
  <value>host:port</value>
</property>
```

333
**Note**: Each process using JuiceFS Hadoop Java SDK will have a unique metric, and Pushgateway will always remember all the collected metrics, resulting in the continuous accumulation of metrics and taking up too much memory, which will also slow down Prometheus crawling metrics. It is recommended to clean up metrics which `job` is `juicefs` on Pushgateway regularly.
334 335 336

It is recommended to use the following command to clean up once every hour. The running Hadoop Java SDK will continue to update after the metrics are cleared, which basically does not affect the use.

337 338 339
> ```bash
> $ curl -X DELETE http://host:9091/metrics/job/juicefs
> ```
340

341
For a description of all monitoring metrics, please refer to [JuiceFS Metrics](p8s_metrics.md).
342

343 344
## Benchmark

345 346
Here are a series of methods to use the built-in stress testing tool of the JuiceFS client to test the performance of the client environment that has been successfully deployed.

347

348 349
### 1. Local Benchmark
#### Metadata
350

351
- **create**
352 353

  ```shell
T
tangyoupeng 已提交
354
  hadoop jar juicefs-hadoop.jar nnbench create -files 10000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/NNBench -local
355 356
  ```

357
  This command will create 10000 empty files
358

359
- **open**
360 361

  ```shell
T
tangyoupeng 已提交
362
  hadoop jar juicefs-hadoop.jar nnbench open -files 10000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/NNBench -local
363 364
  ```

365
  This command will open 10000 files without reading data
366

367
- **rename**
368 369

  ```shell
T
tangyoupeng 已提交
370
  hadoop jar juicefs-hadoop.jar nnbench rename -files 10000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/NNBench -local
371 372
  ```

373
- **delete**
374 375

  ```shell
T
tangyoupeng 已提交
376
  hadoop jar juicefs-hadoop.jar nnbench delete -files 10000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/NNBench -local
377 378
  ```

379
- **For reference**
380

381 382 383 384 385 386
| Operation | TPS  | Latency (ms) |
| --------- | ---- | ------------ |
| create    | 644  | 1.55         |
| open      | 3467 | 0.29         |
| rename    | 483  | 2.07         |
| delete    | 506  | 1.97         |
387

S
Suave Su 已提交
388
#### I/O Performance
389

390
- **sequential write**
391 392

  ```shell
T
tangyoupeng 已提交
393
  hadoop jar juicefs-hadoop.jar dfsio -write -size 20000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/DFSIO -local
394 395
  ```

396
- **sequential read**
397 398

  ```shell
T
tangyoupeng 已提交
399
  hadoop jar juicefs-hadoop.jar dfsio -read -size 20000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/DFSIO -local
400 401 402 403
  ```

  When run the cmd for the second time, the result may be much better than the first run. It's because the data was cached in memory, just clean the local disk cache.

404
- **For reference**
405

406 407
| Operation | Throughput (MB/s) |
| --------- | ----------------- |
408 409
| write     | 647               |
| read      | 111               |
410

411
If the network bandwidth of the machine is relatively low, it can generally reach the network bandwidth bottleneck.
412

413
### 2. Distributed Benchmark
414

415
The following command will start the MapReduce distributed task to test the metadata and IO performance. During the test, it is necessary to ensure that the cluster has sufficient resources to start the required map tasks.
416

417
Computing resources used in this test:
418

419 420
- **Server**: 4 cores and 32 GB memory, burst bandwidth 5Gbit/s x 3
- **Database**: Alibaba Cloud Redis 5.0 Community 4G Master-Slave Edition
421

422 423 424
#### Metadata

- **create**
425 426

  ```shell
T
tangyoupeng 已提交
427
  hadoop jar juicefs-hadoop.jar nnbench create -maps 10 -threads 10 -files 1000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/NNBench
428 429 430 431
  ```

  10 map task, each has 10 threads, each thread create 1000 empty file. 100000 files in total

432
- **open**
433 434

  ```shell
T
tangyoupeng 已提交
435
  hadoop jar juicefs-hadoop.jar nnbench open -maps 10 -threads 10 -files 1000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/NNBench
436 437 438
  ```

  10 map task, each has 10 threads, each thread open 1000 file. 100000 files in total
439

440
- **rename**
441 442

  ```shell
T
tangyoupeng 已提交
443
  hadoop jar juicefs-hadoop.jar nnbench rename -maps 10 -threads 10 -files 1000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/NNBench
444 445 446 447
  ```

  10 map task, each has 10 threads, each thread rename 1000 file. 100000 files in total

448
- **delete**
449 450

  ```shell
T
tangyoupeng 已提交
451
  hadoop jar juicefs-hadoop.jar nnbench delete -maps 10 -threads 10 -files 1000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/NNBench
452 453 454
  ```

  10 map task, each has 10 threads, each thread delete 1000 file. 100000 files in total
455

456
- **For reference**
457

458
  - 10 threads
459

460 461 462 463 464 465
  | Operation | IOPS | Latency (ms) |
  | --------- | ---- | ------------ |
  | create    | 4178 | 2.2          |
  | open      | 9407 | 0.8          |
  | rename    | 3197 | 2.9          |
  | delete    | 3060 | 3.0          |
466

467
  - 100 threads
468

469 470 471 472 473 474
  | Operation | IOPS  | Latency (ms) |
  | --------- | ----  | ------------ |
  | create    | 11773 | 7.9          |
  | open      | 34083 | 2.4          |
  | rename    | 8995  | 10.8         |
  | delete    | 7191  | 13.6         |
475

S
Suave Su 已提交
476
#### I/O Performance
477

478
- **sequential write**
479 480

  ```shell
T
tangyoupeng 已提交
481
  hadoop jar juicefs-hadoop.jar dfsio -write -maps 10 -size 10000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/DFSIO
482 483 484 485
  ```

  10 map task, each task write 10000MB random data sequentially

486
- **sequential read**
487 488

  ```shell
T
tangyoupeng 已提交
489
  hadoop jar juicefs-hadoop.jar dfsio -read -maps 10 -size 10000 -baseDir jfs://{JFS_NAME}/tmp/benchmarks/DFSIO
490 491 492 493 494
  ```

  10 map task, each task read 10000MB random data sequentially


495
- **For reference**
496

497 498 499 500
| Operation | Average throughput (MB/s) | Total Throughput (MB/s) |
| --------- | ------------------------- | ----------------------- |
| write     | 198                       | 1835                    |
| read      | 124                       | 1234                    |
501 502


503 504
## FAQ

505
### 1. `Class io.juicefs.JuiceFileSystem not found` exception
506

507
It means JAR file was not loaded, you can verify it by `lsof -p {pid} | grep juicefs`.
508

509
You should check whether the JAR file was located properly, or other users have the read permission.
510

511
Some Hadoop distribution also need to modify `mapred-site.xml` and put the JAR file location path to the end of the parameter `mapreduce.application.classpath`.
512

513
### 2. `No FilesSystem for scheme: jfs` exception
514

515
It means JuiceFS Hadoop Java SDK was not configured properly, you need to check whether there is JuiceFS related configuration in the `core-site.xml` of the component configuration.