未验证 提交 3e13b9b3 编写于 作者: K Kristi 提交者: GitHub

update aws deployment for 2.6.0 (#7668)

### Motivation
I tried to follow the Aws deployment guide at https://pulsar.apache.org/docs/en/deploy-aws/ but found it was pretty outdated - It was trying to install pulsar 2.1.0-incubating. This PR updates it to install 2.6.0.

### Modifications

* Updated the pulsar version to 2.6.0
  * Fixed download location for 2.6.0
  * Updated config files for 2.6.0
  * Fixed connector installation for 2.6.0
  * Fixed Ansible's yum warning about installing multiple packages
上级 c54a47e2
......@@ -30,7 +30,7 @@
src: "{{ item.src }}"
fstype: xfs
opts: defaults,noatime,nodiscard
state: present
state: mounted
with_items:
- { path: "/mnt/journal", src: "/dev/nvme0n1" }
- { path: "/mnt/storage", src: "/dev/nvme1n1" }
......@@ -28,20 +28,21 @@
state: directory
with_items: ["/opt/pulsar"]
- name: Install RPM packages
yum: pkg={{ item }} state=latest
with_items:
- wget
- java
- sysstat
- vim
yum:
state: latest
name:
- wget
- java
- sysstat
- vim
- set_fact:
zookeeper_servers: "{{ groups['zookeeper']|map('extract', hostvars, ['ansible_default_ipv4', 'address'])|map('regex_replace', '(.*)', '\\1:2181') | join(',') }}"
service_url: "pulsar://{{ hostvars[groups['proxy'][0]].public_ip }}:6650/"
http_url: "http://{{ hostvars[groups['proxy'][0]].public_ip }}:8080/"
pulsar_version: "2.1.0-incubating"
zookeeper_servers: "{{ groups['zookeeper']|map('extract', hostvars, ['ansible_default_ipv4', 'address'])|map('regex_replace', '^(.*)$', '\\1:2181') | join(',') }}"
service_url: "{{ pulsar_service_url }}"
http_url: "{{ pulsar_web_url }}"
pulsar_version: "2.6.0"
- name: Download Pulsar binary package
unarchive:
src: http://archive.apache.org/dist/incubator/pulsar/pulsar-{{ pulsar_version }}/apache-pulsar-{{ pulsar_version }}-bin.tar.gz
src: https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=pulsar/pulsar-{{ pulsar_version }}/apache-pulsar-{{ pulsar_version }}-bin.tar.gz
remote_src: yes
dest: /opt/pulsar
extra_opts: ["--strip-components=1"]
......@@ -123,12 +124,45 @@
connection: ssh
become: true
tasks:
- name: Download Pulsar IO package
unarchive:
src: http://archive.apache.org/dist/incubator/pulsar/pulsar-{{ pulsar_version }}/apache-pulsar-io-connectors-{{ pulsar_version }}-bin.tar.gz
remote_src: yes
dest: /opt/pulsar
extra_opts: ["--strip-components=1"]
- name: Create connectors directory
file:
path: "/opt/pulsar/{{ item }}"
state: directory
loop:
- connectors
- name: Download Pulsar IO packages
get_url:
url: https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=pulsar/pulsar-{{ pulsar_version }}/connectors/pulsar-io-{{ item }}-{{ pulsar_version }}.nar
dest: /opt/pulsar/connectors/pulsar-io-{{ item }}-{{ pulsar_version }}.nar
loop:
# - aerospike
# - canal
# - cassandra
# - data-generator
# - debezium-mongodb
# - debezium-mysql
# - debezium-postgres
# - dynamodb
# - elastic-search
# - file
# - flume
# - hbase
# - hdfs2
# - hdfs3
# - influxdb
# - jdbc-clickhouse
# - jdbc-mariadb
# - jdbc-postgres
# - jdbc-sqlite
- kafka
# - kafka-connect-adaptor
# - kinesis
# - mongo
# - netty
# - rabbitmq
# - redis
# - solr
# - twitter
- name: Set up broker
template:
src: "../templates/broker.conf"
......
......@@ -52,8 +52,8 @@ minUsableSizeForIndexFileCreation=1073741824
# Configure a specific hostname or IP address that the bookie should use to advertise itself to
# clients. If not set, bookie will advertised its own IP address or hostname, depending on the
# listeningInterface and `seHostNameAsBookieID settings.
# advertisedAddress=
# listeningInterface and useHostNameAsBookieID settings.
advertisedAddress=
# Whether the bookie allowed to use a loopback interface as its primary
# interface(i.e. the interface it uses to establish its identity)?
......@@ -92,7 +92,7 @@ flushInterval=60000
# Whether the bookie should use its hostname to register with the
# co-ordination service(eg: Zookeeper service).
# When false, bookie will use its ipaddress for the registration.
# When false, bookie will use its ip address for the registration.
# Defaults to false.
useHostNameAsBookieID=false
......@@ -224,18 +224,18 @@ maxPendingAddRequestsPerThread=10000
auditorPeriodicBookieCheckInterval=86400
# The number of entries that a replication will rereplicate in parallel.
rereplicationEntryBatchSize=5000
rereplicationEntryBatchSize=100
# Auto-replication
# The grace period, in seconds, that the replication worker waits before fencing and
# replicating a ledger fragment that's still being written to upon bookie failure.
# openLedgerRereplicationGracePeriod=30
openLedgerRereplicationGracePeriod=30
# Whether the bookie itself can start auto-recovery service also or not
autoRecoveryDaemonEnabled=true
# How long to wait, in seconds, before starting auto recovery of a lost bookie
# lostBookieRecoveryDelay=0
lostBookieRecoveryDelay=0
#############################################################################
## Netty server settings
......@@ -268,28 +268,34 @@ serverTcpNoDelay=true
# The Recv ByteBuf allocator max buf size.
# byteBufAllocatorSizeMax=1048576
# The maximum netty frame size in bytes. Any message received larger than this will be rejected. The default value is 1G.
nettyMaxFrameSizeBytes=5253120
#############################################################################
## Journal settings
#############################################################################
# The journal format version to write.
# Available formats are 1-5:
# Available formats are 1-6:
# 1: no header
# 2: a header section was added
# 3: ledger key was introduced
# 4: fencing key was introduced
# 5: expanding header to 512 and padding writes to align sector size configured by `journalAlignmentSize`
# By default, it is `4`. If you'd like to enable `padding-writes` feature, you can set journal version to `5`.
# 6: persisting explicitLac is introduced
# By default, it is `6`.
# If you'd like to disable persisting ExplicitLac, you can set this config to < `6` and also
# fileInfoFormatVersionToWrite should be set to 0. If there is mismatch then the serverconfig is considered invalid.
# You can disable `padding-writes` by setting journal version back to `4`. This feature is available in 4.5.0
# and onward versions.
# journalFormatVersionToWrite=4
journalFormatVersionToWrite=5
# Max file size of journal file, in mega bytes
# A new journal file will be created when the old one reaches the file size limitation
journalMaxSizeMB=2048
# Max number of old journal file to kept
# Keep a number of old journal files would help data recovery in specia case
# Keep a number of old journal files would help data recovery in special case
journalMaxBackups=5
# How much space should we pre-allocate at a time in the journal.
......@@ -345,7 +351,7 @@ ledgerStorageClass=org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage
# For example:
# ledgerDirectories=/tmp/bk1-data,/tmp/bk2-data
#
# Ideally ledger dirs and journal dir are each in a differet device,
# Ideally ledger dirs and journal dir are each in a different device,
# which reduce the contention between random i/o and sequential write.
# It is possible to run with a single disk, but performance will be significantly lower.
ledgerDirectories=data/bookkeeper/ledgers
......@@ -360,7 +366,7 @@ ledgerDirectories=data/bookkeeper/ledgers
auditorPeriodicCheckInterval=604800
# Whether sorted-ledger storage enabled (default true)
# sortedLedgerStorageEnabled=ture
# sortedLedgerStorageEnabled=true
# The skip list data size limitation (default 64MB) in EntryMemTable
# skipListSizeLimit=67108864L
......@@ -376,9 +382,19 @@ auditorPeriodicCheckInterval=604800
# to gain performance according your requirements.
openFileLimit=0
# The fileinfo format version to write.
# Available formats are 0-1:
# 0: Initial version
# 1: persisting explicitLac is introduced
# By default, it is `1`.
# If you'd like to disable persisting ExplicitLac, you can set this config to 0 and
# also journalFormatVersionToWrite should be set to < 6. If there is mismatch then the
# serverconfig is considered invalid.
fileInfoFormatVersionToWrite=0
# Size of a index page in ledger cache, in bytes
# A larger index page can improve performance writing page to disk,
# which is efficent when you have small number of ledgers and these
# which is efficient when you have small number of ledgers and these
# ledgers have similar number of entries.
# If you have large number of ledgers and each ledger has fewer entries,
# smaller index page would improve memory usage.
......@@ -391,7 +407,7 @@ openFileLimit=0
# pageLimit*pageSize should not more than JVM max memory limitation,
# otherwise you would got OutOfMemoryException.
# In general, incrementing pageLimit, using smaller index page would
# gain bettern performance in lager number of ledgers with fewer entries case
# gain better performance in lager number of ledgers with fewer entries case
# If pageLimit is -1, bookie server will use 1/3 of JVM memory to compute
# the limitation of number of index pages.
pageLimit=0
......@@ -405,7 +421,7 @@ pageLimit=0
# and garbage collected. Try to read 'BookKeeper Internals' for detail info.
# ledgerManagerFactoryClass=org.apache.bookkeeper.meta.HierarchicalLedgerManagerFactory
# @Drepcated - `ledgerManagerType` is deprecated in favor of using `ledgerManagerFactoryClass`.
# @Deprecated - `ledgerManagerType` is deprecated in favor of using `ledgerManagerFactoryClass`.
# ledgerManagerType=hierarchical
# Root Zookeeper path to store ledger metadata
......@@ -429,7 +445,7 @@ entryLogFilePreallocationEnabled=true
# happens on log rotation.
# Flushing in smaller chunks but more frequently reduces spikes in disk
# I/O. Flushing too frequently may also affect performance negatively.
# flushEntrylogBytes=0
flushEntrylogBytes=268435456
# The number of bytes we should use as capacity for BufferedReadChannel. Default is 512 bytes.
readBufferSizeBytes=4096
......@@ -462,6 +478,7 @@ minorCompactionThreshold=0.2
# Interval to run minor compaction, in seconds
# If it is set to less than zero, the minor compaction is disabled.
# Note: should be greater than gcWaitTime.
minorCompactionInterval=3600
# Set the maximum number of entries which can be compacted without flushing.
......@@ -484,6 +501,7 @@ majorCompactionThreshold=0.5
# Interval to run major compaction, in seconds
# If it is set to less than zero, the major compaction is disabled.
# Note: should be greater than gcWaitTime.
majorCompactionInterval=86400
# Throttle compaction by bytes or by entries.
......@@ -521,7 +539,7 @@ readOnlyModeEnabled=true
# Whether the bookie is force started in read only mode or not
# forceReadOnlyBookie=false
# Persiste the bookie status locally on the disks. So the bookies can keep their status upon restarts
# Persist the bookie status locally on the disks. So the bookies can keep their status upon restarts
# @Since 4.6
# persistBookieStatusEnabled=false
......@@ -531,7 +549,7 @@ readOnlyModeEnabled=true
# For each ledger dir, maximum disk space which can be used.
# Default is 0.95f. i.e. 95% of disk can be used at most after which nothing will
# be written to that partition. If all ledger dir partions are full, then bookie
# be written to that partition. If all ledger dir partitions are full, then bookie
# will turn to readonly mode if 'readOnlyModeEnabled=true' is set, else it will
# shutdown.
# Valid values should be in between 0 and 1 (exclusive).
......@@ -590,6 +608,16 @@ zkEnableSecurity=false
## Server parameters
#############################################################################
# The flag enables/disables starting the admin http server. Default value is 'false'.
httpServerEnabled=false
# The http server port to listen on. Default value is 8080.
# Use `8000` as the port to keep it consistent with prometheus stats provider
httpServerPort=8000
# The http server class
httpServerClass=org.apache.bookkeeper.http.vertx.VertxHttpServer
# Configure a list of server components to enable and load on a bookie server.
# This provides the plugin run extra services along with a bookie server.
#
......@@ -605,12 +633,15 @@ zkEnableSecurity=false
# Size of Write Cache. Memory is allocated from JVM direct memory.
# Write cache is used to buffer entries before flushing into the entry log
# For good performance, it should be big enough to hold a sub
dbStorage_writeCacheMaxSizeMb=512
# For good performance, it should be big enough to hold a substantial amount
# of entries in the flush interval
# By default it will be allocated to 1/4th of the available direct memory
dbStorage_writeCacheMaxSizeMb=
# Size of Read cache. Memory is allocated from JVM direct memory.
# This read cache is pre-filled doing read-ahead whenever a cache miss happens
dbStorage_readAheadCacheMaxSizeMb=256
# By default it will be allocated to 1/4th of the available direct memory
dbStorage_readAheadCacheMaxSizeMb=
# How many entries to pre-fill in cache after a read cache miss
dbStorage_readAheadCacheBatchSize=1000
......@@ -622,8 +653,8 @@ dbStorage_readAheadCacheBatchSize=1000
# Size of RocksDB block-cache. For best performance, this cache
# should be big enough to hold a significant portion of the index
# database which can reach ~2GB in some cases
# Default is 256 MBytes
dbStorage_rocksDB_blockCacheSize=268435456
# Default is to use 10% of the direct memory size
dbStorage_rocksDB_blockCacheSize=
# Other RocksDB specific tunables
dbStorage_rocksDB_writeBufferSizeMB=64
......
......@@ -17,12 +17,53 @@
# under the License.
#
# Pulsar Client and pulsar-admin configuration
webServiceUrl=http://{{ hostvars[groups['pulsar'][0]].private_ip }}:8080/
brokerServiceUrl=pulsar://{{ hostvars[groups['pulsar'][0]].private_ip }}:6650/
#authPlugin=
#authParams=
#useTls=
# Configuration for pulsar-client and pulsar-admin CLI tools
# URL for Pulsar REST API (for admin operations)
# For TLS:
# webServiceUrl=https://localhost:8443/
webServiceUrl={{ http_url }}
# URL for Pulsar Binary Protocol (for produce and consume operations)
# For TLS:
# brokerServiceUrl=pulsar+ssl://localhost:6651/
brokerServiceUrl={{ service_url }}
# Authentication plugin to authenticate with servers
# e.g. for TLS
# authPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls
authPlugin=
# Parameters passed to authentication plugin.
# A comma separated list of key:value pairs.
# Keys depend on the configured authPlugin.
# e.g. for TLS
# authParams=tlsCertFile:/path/to/client-cert.pem,tlsKeyFile:/path/to/client-key.pem
authParams=
# Allow TLS connections to servers whose certificate cannot be
# be verified to have been signed by a trusted certificate
# authority.
tlsAllowInsecureConnection=false
# Whether server hostname must match the common name of the certificate
# the server is using.
tlsEnableHostnameVerification=false
#tlsTrustCertsFilePath
# Path for the trusted TLS certificate file.
# This cert is used to verify that any cert presented by a server
# is signed by a certificate authority. If this verification
# fails, then the cert is untrusted and the connection is dropped.
tlsTrustCertsFilePath=
# Enable TLS with KeyStore type configuration in broker.
useKeyStoreTls=false
# TLS KeyStore type configuration: JKS, PKCS12
tlsTrustStoreType=JKS
# TLS TrustStore path
tlsTrustStorePath=
# TLS TrustStore password
tlsTrustStorePassword=
......@@ -17,15 +17,39 @@
# under the License.
#
### --- Broker Discovery --- ###
# The ZooKeeper quorum connection string (as a comma-separated list)
zookeeperServers={{ zookeeper_servers }}
# Configuration store connection string (as a comma-separated list)
configurationStoreServers={{ zookeeper_servers }}
# if Service Discovery is Disabled this url should point to the discovery service provider.
brokerServiceURL=
brokerServiceURLTLS=
# These settings are unnecessary if `zookeeperServers` is specified
brokerWebServiceURL=
brokerWebServiceURLTLS=
# If function workers are setup in a separate cluster, configure the following 2 settings
# to point to the function workers cluster
functionWorkerWebServiceURL=
functionWorkerWebServiceURLTLS=
# ZooKeeper session timeout (in milliseconds)
zookeeperSessionTimeoutMs=30000
# ZooKeeper cache expiry time in seconds
zooKeeperCacheExpirySeconds=300
### --- Server --- ###
# Hostname or IP address the service advertises to the outside world.
# If not set, the value of `InetAddress.getLocalHost().getHostname()` is used.
advertisedAddress=
# The port to use for server binary Protobuf requests
servicePort=6650
......@@ -42,6 +66,28 @@ webServicePortTls=8443
# to service discovery health checks
statusFilePath=
# Proxy log level, default is 0.
# 0: Do not log any tcp channel info
# 1: Parse and log any tcp channel info and command info without message body
# 2: Parse and log channel info, command info and message body
proxyLogLevel=0
### ---Authorization --- ###
# Role names that are treated as "super-users," meaning that they will be able to perform all admin
# operations and publish/consume to/from all topics (as a comma-separated list)
superUserRoles=
# Whether authorization is enforced by the Pulsar proxy
authorizationEnabled=false
# Authorization provider as a fully qualified class name
authorizationProvider=org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider
# Whether client authorization credentials are forwared to the broker for re-authorization.
# Authentication must be enabled via authenticationEnabled=true for this to take effect.
forwardAuthorizationCredentials=false
### --- Authentication --- ###
# Whether authentication is enabled for the Pulsar proxy
......@@ -50,11 +96,10 @@ authenticationEnabled=false
# Authentication provider name list (a comma-separated list of class names)
authenticationProviders=
# Whether authorization is enforced by the Pulsar proxy
authorizationEnabled=false
# When this parameter is not empty, unauthenticated users perform as anonymousUserRole
anonymousUserRole=
# Authorization provider as a fully qualified class name
authorizationProvider=org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider
### --- Client Authentication --- ###
# The three brokerClient* authentication settings below are for the proxy itself and determine how it
# authenticates with Pulsar brokers
......@@ -68,15 +113,14 @@ brokerClientAuthenticationParameters=
# The path to trusted certificates used by the Pulsar proxy to authenticate with Pulsar brokers
brokerClientTrustCertsFilePath=
# Role names that are treated as "super-users," meaning that they will be able to perform all admin
# operations and publish/consume to/from all topics (as a comma-separated list)
superUserRoles=
# Whether TLS is enabled when communicating with Pulsar brokers
tlsEnabledWithBroker=false
# Whether client authorization credentials are forwared to the broker for re-authorization.
# Authentication must be enabled via authenticationEnabled=true for this to take effect.
forwardAuthorizationCredentials=false
# Tls cert refresh duration in seconds (set 0 to check on every new connection)
tlsCertRefreshCheckDurationSec=300
##### --- Rate Limiting --- #####
# --- RateLimiting ----
# Max concurrent inbound connections. The proxy will reject requests beyond that.
maxConcurrentInboundConnections=10000
......@@ -85,12 +129,9 @@ maxConcurrentLookupRequests=50000
##### --- TLS --- #####
# Whether TLS is enabled for the proxy
# Deprecated - use servicePortTls and webServicePortTls instead
tlsEnabledInProxy=false
# Whether TLS is enabled when communicating with Pulsar brokers
tlsEnabledWithBroker=false
# Path for the TLS certificate file
tlsCertificateFilePath=
......@@ -112,10 +153,61 @@ tlsAllowInsecureConnection=false
# Whether the hostname is validated when the proxy creates a TLS connection with brokers
tlsHostnameVerificationEnabled=false
# Specify the tls protocols the broker will use to negotiate during TLS handshake
# (a comma-separated list of protocol names).
# Examples:- [TLSv1.2, TLSv1.1, TLSv1]
tlsProtocols=
# Specify the tls cipher the broker will use to negotiate during TLS Handshake
# (a comma-separated list of ciphers).
# Examples:- [TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256]
tlsCiphers=
# Whether client certificates are required for TLS. Connections are rejected if the client
# certificate isn't trusted.
tlsRequireTrustedClientCertOnConnect=false
##### --- HTTP --- #####
# Http directs to redirect to non-pulsar services.
httpReverseProxyConfigs=
# Http output buffer size. The amount of data that will be buffered for http requests
# before it is flushed to the channel. A larger buffer size may result in higher http throughput
# though it may take longer for the client to see data.
# If using HTTP streaming via the reverse proxy, this should be set to the minimum value, 1,
# so that clients see the data as soon as possible.
httpOutputBufferSize=32768
# Number of threads to use for HTTP requests processing. Default is
# 2 * Runtime.getRuntime().availableProcessors()
httpNumThreads=
### --- Token Authentication Provider --- ###
## Symmetric key
# Configure the secret key to be used to validate auth tokens
# The key can be specified like:
# tokenSecretKey=data:;base64,xxxxxxxxx
# tokenSecretKey=file:///my/secret.key
tokenSecretKey=
## Asymmetric public/private key pair
# Configure the public key to be used to validate auth tokens
# The key can be specified like:
# tokenPublicKey=data:;base64,xxxxxxxxx
# tokenPublicKey=file:///my/public.key
tokenPublicKey=
# The token "claim" that will be interpreted as the authentication "role" or "principal" by AuthenticationProviderToken (defaults to "sub" if blank)
tokenAuthClaim=
# The token audience "claim" name, e.g. "aud", that will be used to get the audience from token.
# If not set, audience will not be verified.
tokenAudienceClaim=
# The token audience stands for this broker. The field `tokenAudienceClaim` of a valid token, need contains this.
tokenAudience=
### --- Deprecated config variables --- ###
......
......@@ -29,6 +29,11 @@ syncLimit=5
dataDir=data/zookeeper
# the port at which the clients will connect
clientPort=2181
# the port at which the admin will listen
admin.enableServer=true
admin.serverPort=9990
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
......@@ -44,6 +49,12 @@ autopurge.snapRetainCount=3
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1
# Requires updates to be synced to media of the transaction log before finishing
# processing the update. If this option is set to 'no', ZooKeeper will not require
# updates to be synced to the media.
# WARNING: it's not recommended to run a production ZK cluster with forceSync disabled.
forceSync=yes
{% for zk in groups['zookeeper'] %}
server.{{ hostvars[zk].zid }}={{ hostvars[zk].private_ip }}:2888:3888
{% endfor %}
......@@ -173,7 +173,11 @@ Remember to enter this command just only once. If you attempt to enter this comm
## Run the Pulsar playbook
Once you have created the necessary AWS resources using Terraform, you can install and run Pulsar on the Terraform-created EC2 instances using Ansible. To do so, enter this command:
Once you have created the necessary AWS resources using Terraform, you can install and run Pulsar on the Terraform-created EC2 instances using Ansible.
(Optional) If you want to use any [built-in IO connectors](io-connectors.md) , edit the `Download Pulsar IO packages` task in the `deploy-pulsar.yaml` file and uncomment the connectors you want to use.
To run the playbook, enter this command:
```bash
$ ansible-playbook \
......@@ -220,4 +224,3 @@ Once you are in the shell, enter the following command:
```
If all of these commands are successful, Pulsar clients can now use your cluster!
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册