docs: update readme of Pegasus (#349)

547f647a · Wu Tao · QinZuoyan · e0071808 · 547f647a · e0071808
5 changed file
--- a/README.md
+++ b/README.md
-[![Build Status](https://travis-ci.org/XiaoMi/pegasus.svg?branch=master)](https://travis-ci.org/XiaoMi/pegasus)
-
-![docs/media-img/pegasus-logo.png](docs/media-img/pegasus-logo.png)
-
-[**中文Wiki**](https://github.com/xiaomi/pegasus/wiki), [**微信交流群**](https://github.com/XiaoMi/pegasus/wiki/%E5%85%B3%E4%BA%8E%E6%88%91%E4%BB%AC), [**slack channel**](https://join.slack.com/t/pegasus-kv/shared_invite/enQtMjcyMjQzOTk4Njk1LWVkMjlkMGE5Mzg1Y2M3MDc0NGYyYzQ5YzYyMGE0ZjlhMDMyNjU1ZGViYzdjZmUwNjVmNGE0ZDdkMWJiN2Q1MDY)
-
-## What is Pegasus?
-
-Pegasus is a distributed key-value storage system developed and maintained by Xiaomi Cloud Storage Team, with targets of
-high availability, high performance, strong consistency and ease of use. The original motivation of this project is to replace
-[Apache HBase](https://hbase.apache.org/) for users who only need simple key-value schema but require low latency and high availability.
-It is based on the modified [rDSN](https://github.com/XiaoMi/rdsn)(original [Microsoft/rDSN](https://github.com/Microsoft/rDSN)) framework,
-and uses modified [RocksDB](https://github.com/xiaomi/pegasus-rocksdb)(original [facebook/RocksDB](https://github.com/facebook/rocksdb)) as underlying storage engine.
-The consensus algorithm it uses is [PacificA](https://www.microsoft.com/en-us/research/publication/pacifica-replication-in-log-based-distributed-storage-systems/).
-
-## Features
-
-* High performance
-
-  Here are several key aspects that make Pegasus a high performance storage system: 
-     - Implemented in C++
-     - [Staged event-driven architecture](https://en.wikipedia.org/wiki/Staged_event-driven_architecture), a distinguished architecture that Nginx adopts.
-     - High performance storage-engine with [RocksDB](https://github.com/facebook/rocksdb), though slight change is made to support fast learning.
-
-* High availability
-
-  Unlike Bigtable/HBase, a non-layered replication architecture is adopted in Pegasus in which an external DFS like GFS/HDFS isn't the dependency of the persistent data, which benefits the availability a lot. Meanwhile, availability problems in HBase which result from Java GC are totally eliminated for the use of C++.
+[github-release]: https://github.com/XiaoMi/pegasus/releases
+[PacificA]: https://www.microsoft.com/en-us/research/publication/pacifica-replication-in-log-based-distributed-storage-systems/
+[pegasus-rocksdb]: https://github.com/xiaomi/pegasus-rocksdb
+[facebook-rocksdb]: https://github.com/facebook/rocksdb
+[hbase]: https://hbase.apache.org/
+[website]: https://pegasus-kv.github.io

-* Strong consistency
+![pegasus-logo](docs/media-img/pegasus-logo.png)

-  We adopt the [PacificA](https://www.microsoft.com/en-us/research/publication/pacifica-replication-in-log-based-distributed-storage-systems/#) consensus algorithm to make Pegasus a strong consistency system.
-
-* Easily scaling out
-
-  Load can be balanced dynamically to newly added data nodes with a global load balancer.
-
-* Easy to use
-
-  We provided C++ and Java client with simple interfaces to make it easy to use.
-
-## Architecture overview
+[![Build Status](https://travis-ci.org/XiaoMi/pegasus.svg?branch=master)](https://travis-ci.org/XiaoMi/pegasus)
+[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
+[![Releases](https://img.shields.io/github/release/xiaomi/pegasus.svg)][github-release]

-The following diagram shows the architecture of Pegasus:
+**Note**: The `master` branch may be in an *unstable or even broken state* during development.
+Please use [releases][github-release] instead of the `master` branch in order to get stable binaries.

-![docs/media-img/pegasus-architecture-overview.png](docs/media-img/pegasus-architecture-overview.png)
+Pegasus is a distributed key-value storage system which is designed to be:

-Here is a brief explanation on the concepts and terms in the diagram:
+- **horizontally scalable** distributed using hash-based partitioning
+- **strongly consistent**: ensured by [PacificA][PacificA] consensus protocol
+- **high-performance**: using [RocksDB][pegasus-rocksdb] as underlying storage engine
+- **simple**: well-defined, easy-to-use APIs

-* MetaServer: a component in Pegasus to do the whole cluster management. The meta-server is something like "HMaster" in HBase.
-* Zookeeper: the external dependency of Pegasus. We use zookeeper to store the meta state of the cluster and do meta-server's fault tolerance.
-* ReplicaServer: a component in Pegasus to serve client's read/write request. The replica-server is also the container for replicas.
-* Partition/replica: the whole key space is split into several partitions, and each partition has several replicas for fault tolerance. You may want to refer to the [PacificA](https://www.microsoft.com/en-us/research/publication/pacifica-replication-in-log-based-distributed-storage-systems/#) algorithm for more details.
+Pegasus has been widely-used in XiaoMi and serves millions of requests per second.
+It is a mature, active project. We hope to build a diverse developer and user
+community and attract contributions from more people.

-For more details about design and implementation, please refer to PPTs under [`docs/ppt/`](docs/ppt/).
+## Background

-## Data model & API overview
+HBase was recognized as the only large-scale KV store solution in XiaoMi
+until Pegasus came out in 2015 to solve the problem of high latency
+of HBase because of its Java GC and RPC overhead of the underlying distributed filesystem.

-The data model in Pegasus is (hashkey + sortkey) -> value, in which:
-* Hashkey is used for partitioning. Values with different hash keys may stored in different partitions.
-* Sortkey is used for sorting within a hashkey. Values with the **same** hashkey but **different** sortkeys are in the **same partition**, and ordered by the sort key. If you use scan API to scan a single hashkey, you will get the values by the lexicographical order of sortkeys.
+Pegasus targets to fill the gap between Redis and HBase. As the former
+is in-memory, low latency, but does not provide a strong-consistency guarantee.
+And unlike the latter, Pegasus is entirely written in C++ and its write-path
+relies merely on the local filesystem.

-The following diagram shows the data model of Pegasus:
+Apart from the performance requirements, we also need a storage system
+to ensure multiple-level data safety and support fast data migration
+between data centers, automatic load balancing, and online partition split.

-![docs/media-img/pegasus-data-model.png](docs/media-img/pegasus-data-model.png)
+After investigating the existing storage systems in the open source world,
+we could hardly find a suitable solution to satisfy all the requirements.
+So the journey of Pegasus begins.

-## Quick Start
+## To start using Pegasus

-You may want to refer to the [installation guide](docs/installation.md).
+See our documentation on [Pegasus Website][website].

 ## Related Projects

 Submodules:
-* [rDSN](https://github.com/xiaomi/rdsn)
-* [RocksDB](https://github.com/xiaomi/pegasus-rocksdb)
+
+- [rDSN](https://github.com/xiaomi/rdsn)
+- [RocksDB](https://github.com/xiaomi/pegasus-rocksdb)

 Client libs:
-* [Java client](https://github.com/xiaomi/pegasus-java-client)
-* [Python Client](https://github.com/xiaomi/pegasus-python-client)
-* [Go Client](https://github.com/xiaomi/pegasus-go-client)
-* [Node.js Client](https://github.com/xiaomi/pegasus-nodejs-client)
-* [Scala Client](https://github.com/xiaomi/pegasus-scala-client)
+
+- [Java client](https://github.com/xiaomi/pegasus-java-client)
+- [Python Client](https://github.com/xiaomi/pegasus-python-client)
+- [Go Client](https://github.com/xiaomi/pegasus-go-client)
+- [Node.js Client](https://github.com/xiaomi/pegasus-nodejs-client)
+- [Scala Client](https://github.com/xiaomi/pegasus-scala-client)

 Test tools:
-* [Java YCSB](https://github.com/xiaomi/pegasus-YCSB)
-* [Go YCSB](https://github.com/xiaomi/pegasus-YCSB-go)
+
+- [Java YCSB](https://github.com/xiaomi/pegasus-YCSB)
+- [Go YCSB](https://github.com/xiaomi/pegasus-YCSB-go)

 Data import/export tools:
-* [DataX](https://github.com/xiaomi/pegasus-datax)

-## How to contribute
+- [DataX](https://github.com/xiaomi/pegasus-datax)

-We open sourced this project because we know that it is far from mature and needs lots of
-improvement. So we are looking forward to your [contribution](docs/contribution.md).
+## Contact

-If you have more questions, please join our [slack channel](https://join.slack.com/t/pegasus-kv/shared_invite/enQtMjcyMjQzOTk4Njk1LWVkMjlkMGE5Mzg1Y2M3MDc0NGYyYzQ5YzYyMGE0ZjlhMDMyNjU1ZGViYzdjZmUwNjVmNGE0ZDdkMWJiN2Q1MDY).
+- Gitter: <https://gitter.im/XiaoMi/Pegasus>
+- Issues: <https://github.com/XiaoMi/pegasus/issues>

 ## License

-Copyright 2015-2018 Xiaomi, Inc. Licensed under the Apache License, Version 2.0:
-http://www.apache.org/licenses/LICENSE-2.0
-
+Copyright 2015-now Xiaomi, Inc. Licensed under the Apache License, Version 2.0:
+<http://www.apache.org/licenses/LICENSE-2.0>
--- a/docs/contribution.md
+++ b/docs/contribution.md
-Contribute to Pegasus
-===============
-
-## code format
-
-We use "clang-format"(version 3.9) to format our code. For ubuntu users, clang-format-3.9 could be installed via `apt-get`:
-```
-sudo apt-get install clang-format-3.9
-```
-
-After installed clang-format, you can format your code by the ".clang-format" config file in the root of the project.
-
-## C++ development guidelines
-
-Basically, we follow the google-code-style, except for: 
-
-* We prefer to use reference rather than pointers for return value of functions.
-* The compilation of headers is controlled by "#progma once"
-
-Reason for these exceptions is that we develop Pegasus based on Microsoft's open-source project [rDSN](https://github.com/Microsoft/rDSN), and we just follow its rules. Currently we fork a new repo on this project, and modification on our repo is hard to merge though we've made lots contributes to it.
-
-## Roadmap
-
-You may want to refer to the [roadmap](roadmap.md).
--- a/docs/media-img/pegasus-architecture-overview.png
+++ b/docs/media-img/pegasus-architecture-overview.png
--- a/docs/media-img/pegasus-data-model.png
+++ b/docs/media-img/pegasus-data-model.png
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
-# Roadmap
-
-This document defines the roadmap for Pegasus.
-
-#### __API__
-
- [x] Rich APIs for single key access: set, get, del, exist, TTL
- [x] Batch APIs for multiple range keys access within a hashkey
- [x] Async APIs
- [x] Java Client
- [x] Scan operator within hashkey & across hashkey
- [ ] Native clients for other programming languages
- [ ] RESTful API
- [ ] Redis API support
- [ ] Quota
- [ ] Transactions across hashkeys
- [ ] Distributed SQL
-
-#### __Replica Server__
-
- [x] PacificA consensus algorithm
- [x] Fast learning process for learners
- [x] Rocksdb storage engine
- [ ] Raft consensus algorithm
- [ ] Partition split
-
-#### __Meta Server__
-
- [x] Table management: create/drop/recall
- [x] Persistence of metadata to zookeeper
- [x] Load balancer according to replica count & replica distribution on disks of nodes
- [ ] Load balancer according to replica size & capacity & rw load
- [ ] Persistence of metadata by raft consensus
-
-#### __Data Security__
-
- [ ] Replication across data center
- [ ] Cold backup based on snapshots
-
-#### __Admin Tools__
-
- [x] Command line tool
- [x] Rich metrics for monitoring
- [ ] Web admin