Friday, June 23, 2017

Confluent Kafka Installation

Setup EC2 instance

Spin an EC2 instance (m4.large or m4.xlarge) and run  the following commands

sudo lsblk
sudo file -s /dev/xvdb
sudo mkfs -t ext4 /dev/xvdb
sudo mkdir -p /apps/kafka
sudo mount /dev/xvdb /apps/kafka
sudo useradd kafka
sudo chown -R kafka:kafka/apps/kafka
sudo vi /etc/fstab
/dev/xvdb  /apps/kafka ext4    defaults,nofail        0       2

Setup security group and add to EC2 instances

Create security group 'kafka' with the ports
22
2181 (zookeeper client port) 2888, 3888 (zookeeper internal ports)
8081 - 8083 (8081 - schema registry, 8082 - rest proxy)
9021 (control center rest listeners)
9092 (kafka broker)

Install JDK

Download jdk-8u131-linux-x64.tar.gz and copy to /apps/kafka
tar xvzf  jdk-8u131-linux-x64.tar.gz

Install Kafka platform

Download tar xzf confluent-3.2.1-2.11.tar.gz and copy to /apps/kafka
tar xvzf confluent-3.2.1-2.11.tar.gz

Update ~/.bash_profile

PATH=$PATH:$HOME/.local/bin:$HOME/bin
FS_HOME=/apps/kafka
JAVA_HOME=$FS_HOME/jdk1.8.0_131
JAVA_OPTS="$JAVA_OPTS -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Addresses"
CONFLUENT_HOME=$FS_HOME/confluent-3.2.1
SCRIPTS=$FS_HOME/scripts
LOGS=$CONFLUENT_HOME/logs

PATH=$JAVA_HOME/bin:$CONFLUENT_HOME/bin:$PATH
export PATH

Note: create scripts and logs directories

Setup Zookeeper Ensemble


mkdir -p /apps/kafka/zookeeper-data
mkdir -p /apps/kafka/zookeeper-data/dataLog

Update /apps/kafka/confluent-3.2.1/etc/kafka/zookeeper.properties

dataDir=/apps/kafka/zookeeper-data
dataLogDir=/apps/kafka/zookeeper-data/dataLog
clientPort=2181
initLimit=5
syncLimit=2
maxClientCnxns=0
tickTime=2000
server.1={server1-ip}:2888:3888
server.2={server2-ip}:2888:3888
server.3={server3-ip}:2888:3888

Create a file named myid 


mkdir -p /apps/kafka/zookeeper-data
create myid file in this directory
echo 1 > myid (1st zk server)
echo 2 > myid (2nd zk server)
echo 3 > myid (3rd zk server)

Start zookeeper
Zookeeper requires Java

Create a script start_zookeeper.sh
-------
#!/bin/sh
source ~/.bash_profile
cd $CONFLUENT_HOME/bin
./zookeeper-server-start -daemon ../etc/kafka/zookeeper.properties
-------
Run the start script.

Start Kafka

Create directory
mkdir -p /apps/kafka/kafka-logs

Create a file brokers_zks
-------
STAGE_KAFKA_BROKERS={server1-ip}:9092,{server2-ip}:9092,{server3-ip}:9092
STAGE_KAFKA_ZKS={server1-ip}:2181,{server2-ip}:2181,{server3-ip}:2181
-------

Create a script start_kafka_broker.sh 
-------
#!/bin/sh
source ~/.bash_profile
source $SCRIPTS/brokers_zks
cd $CONFLUENT_HOME/bin

CONTROL_CENTER_OPTS="--override metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter --override confluent.metrics.reporter.bootstrap.servers=$STAGE_KAFKA_BROKERS --override confluent.metrics.reporter.zookeeper.connect=$STAGE_KAFKA_ZKS --override confluent.metrics.reporter.topic.replicas=1"

./kafka-server-start -daemon ../etc/kafka/server.properties --override broker.id=2 --override log.dirs=/apps/kafka/kafka-logs  --override zookeeper.connect=$STAGE_KAFKA_ZKS $CONTROL_CENTER_OPTS
-------
Note: you can either use override parameters or update properties file directly

Start Schema Registry

Update /etc/schema-registry/schema-registry.properties
-------
listeners=http://0.0.0.0:8081
kafkastore.connection.url={server1-ip}:2181,{server2-ip}:2181,{server3-ip}:2181
kafkastore.topic=_schemas
debug=false
-------

Create a script start_schema_registry.sh 
-------
#!/bin/sh
source ~/.bash_profile
cd $CONFLUENT_HOME/bin
./schema-registry-start -daemon ../etc/schema-registry/schema-registry.properties
-------

Start Rest Proxy

Update /etc/kafka-rest/kafka-rest.properties
-------
id=kafka-rest-test-server1
schema.registry.url=http://0.0.0.0:8081
zookeeper.connect={server1-ip}:2181,{server2-ip}:2181,{server3-ip}:2181
-------
Create a script start_rest_proxy.sh 
-------
#!/bin/sh
source ~/.bash_profile
cd $CONFLUENT_HOME/bin
./kafka-rest-start ../etc/kafka-rest/kafka-rest.properties > ../logs/nohup-rest-proxy 2>&1 </dev/null &
-------

Start Control Center
Create directory /apps/kafka/control-center-data
Update /etc/confluent-control-center/control-center.properties
-------
zookeeper.connect={server1-ip}:2181,{server2-ip}:2181,{server3-ip}:2181
bootstrap.servers={server1-ip}:9092,{server2-ip}:9092,{server3-ip}:9092
confluent.controlcenter.id=1
confluent.controlcenter.data.dir=/apps/kafka/control-center-data
#confluent.controlcenter.connect.cluster=connect1:8083,connect1:8083,connect3:8083
#confluent.controlcenter.license=/path/to/license/file
-------

Create a script start_control_center.sh 
-------
#!/bin/sh
source ~/.bash_profile
cd $CONFLUENT_HOME/bin
./control-center-start ../etc/confluent-control-center/control-center.properties ../logs/nohup-control-center 2>&1 </dev/null &
-------

Delete logs

#!/bin/sh
source ~/.bash_profile
rm -Rf /apps/kafka/logs/zookeeper-dataLog/*
rm -Rf /apps/kafka/confluent-3.2.1/logs/*
rm -Rf /apps/kafka/kafka-logs/*