Build And Install HAWQ

2018-05-14

官方指南:
https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install#tab-yum

以下内容使用的操作系统为 CentOS7.2

Dependencies 安装依赖

1
2
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm

安装依赖:

1
yum install -y man passwd sudo tar which git mlocate links make bzip2 net-tools autoconf automake libtool m4 gcc gcc-c++ gdb bison flex gperf maven indent libuuid-devel krb5-devel libgsasl-devel expat-devel libxml2-devel perl-ExtUtils-Embed pam-devel python-devel libcurl-devel snappy-devel thrift-devel libyaml-devel libevent-devel bzip2-devel openssl-devel openldap-devel protobuf-devel readline-devel net-snmp-devel apr-devel libesmtp-devel python-pip json-c-devel java-1.7.0-openjdk-devel lcov cmake3 openssh-clients openssh-server perl-JSON perl-Env
1. 如果虚拟机上无法粘贴需要手动输入,一定要注意所有依赖都被安装
2. bison 不要使用 yum 安装。yum 安装的 bison 为3.0,而 hawq需要2.x版本的 bison,安装过程中会报错。前往 GNU 下载 [GNU Bison](https://www.gnu.org/software/bison/)。
3. yum 下载的 openjdk 为32位的 jdk,在64位系统上安装时会提示堆内存不足。一般是由于内存太小引起的。可以通过设置堆大小解决,也可以通过安装64位 jdk 解决。这主要取决于内存大小。jdk7 和8都可以使用,但是9及以上不可用,会报错。
1
2
ln -s /usr/bin/cmake3 /usr/bin/cmake
pip --retries=50 --timeout=300 install pycrypto

OS requirement 操作系统设置

  • use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value.

    • 用文本编辑器修改系统配置文件/etc/sysctl.conf,添加或编辑以下参数:
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      kernel.shmmax = 1000000000
      kernel.shmmni = 4096
      kernel.shmall = 4000000000
      kernel.sem = 250 512000 100 2048
      kernel.sysrq = 1
      kernel.core_uses_pid = 1
      kernel.msgmnb = 65536
      kernel.msgmax = 65536
      kernel.msgmni = 2048
      net.ipv4.tcp_syncookies = 0
      net.ipv4.conf.default.accept_source_route = 0
      net.ipv4.tcp_tw_recycle = 1
      net.ipv4.tcp_max_syn_backlog = 200000
      net.ipv4.conf.all.arp_filter = 1
      net.ipv4.ip_local_port_range = 1281 65535
      net.core.netdev_max_backlog = 200000
      vm.overcommit_memory = 2
      fs.nr_open = 3000000
      kernel.threads-max = 798720
      kernel.pid_max = 798720
      # increase network
      net.core.rmem_max=2097152
      net.core.wmem_max=2097152
  • Execute the following command to apply your updated /etc/sysctl.conf file to the operating system configuration:

    • 执行如下指令应用刚才的系统配置:
      sysctl -p
  • Use a text editor to edit the /etc/security/limits.conf file. Add the following definitions in the exact order that they are listed

  • ( Please make sure fs.nr_open = 3000000 is applied before edit limits.conf, else you may not able to ssh to your instance.)
    • 使用文本编辑器编辑/etc/security/limits.conf,按照如下顺序添加下列参数:
    • (确保fs.nr_open = 3000000在之前已经被设置,否则可能无法ssh连接到实例。
      1
      2
      3
      4
      * soft nofile 2900000
      * hard nofile 2900000
      * soft nproc 131072
      * hard nproc 131072

Build optional extension modules 添加扩展模块

gporca 是 Greenplum 实现的查询优化器,通过./configure --enable-orca添加。pgcrypto 是加密工具,通过./configure --with-pgcrypto --with-openssl添加。其他扩展模块,查看官方指南。

Install Hadoop 安装 Hadoop

Please follow the steps here: Hadoop: Setting up a Single Node Cluster ver 2.9.0..

  • You will also need to set the port for fs.defaultFS to 8020 in etc/hadoop/core-site.xml (The example above set it as 9000.)
  • HDFS is a must, but YARN is optional. YARN is only needed when you want to use YARN as the global resource manager.

    • 为了避免不必要的麻烦,Hadoop 使用官方说明的 2.9.0。
    • Hadoop 的官方文档没有将 Hadoop 路径添加到系统环境变量,所以使用 Hadoop 要么位于其目录下,要么将其添加到系统环境变量。
    • 需要将core-site.xml中的fs.defaultFS设为8020,Hadoop 官方文档的例子设置成了9000。
    • HDFS 必须要安装,但是 YARN 是可选的。HAWQ 有默认的资源管理器。

Get the HAWQ code and Compile 获取和安装 HAWQ

1
2
3
4
# The Apache HAWQ source code can be obtained from the the following link: 
# Apache Repo: https://git-wip-us.apache.org/repos/asf/incubator-hawq.git or
# GitHub Mirror: https://github.com/apache/incubator-hawq.
git clone https://git-wip-us.apache.org/repos/asf/incubator-hawq.git
1. HAWQ 可以从 Apache 的分支获取,也可以从 Github 获取镜像。
2. 总大小大约160MB
1
2
# The code directory is incubator-hawq.
CODE_BASE=`pwd`/incubator-hawq
CODE_BASE 是 incubator-hawq 源码的位置。进入文件夹进行配置。
1
2
3
4
# Run command to generate makefile.
# Or you could use --prefix=/hawq/install/path to change the Apache HAWQ install path,
# and you can also add some optional components using options (--with-python --with-perl)
./configure
1. 要在 incubator-hawq 文件夹下执行命令。
2. `--prefix`用来指定安装路径,可以放在`/usr/local/hawq`。
3. 其他依赖可以在官方指南查询。
1
2
3
4
5
6
7
8
9
10
11
12
# Run command to build and install
# To build concurrently , run make with -j option. For example, make -j8
# On Linux system without large memory, you will probably encounter errors like
# "Error occurred during initialization of VM" and/or "Could not reserve enough space for object heap"
# and/or "out of memory", try to set vm.overcommit_memory = 1 temporarily, and/or avoid "-j" build,
# and/or add more memory and then rebuild.
# On mac os, you will probably see this error: "'openssl/ssl.h' file not found".
# "brew link openssl --force" should be able to solve the issue.
make -j8

# Install HAWQ
make install
1. `-j8`用于指示进行并发安装。会导致一系列问题(如“Could not reserve enough space for object heap”)。避免使用`-j`参数来避免问题。
2. 最简单的方法永远是用更大的内存。
3. 官方文档没有提到,但是 HAWQ 是基于 Greenplum 的,安装完成后,可能需要将目录的所有者变更为`gpadmin`。

Init/Start/Stop HAWQ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Before initializing HAWQ, you need to install HDFS and make sure it works.

source /hawq/install/path/greenplum_path.sh

# Besides you need to set password-less ssh on the systems.
# Exchange SSH keys between the hosts host1, host2, and host3:
hawq ssh-exkeys -h host1 -h host2 -h host3
hawq init cluster # after initialization, HAWQ is started by default

# Now you can stop/restart/start the cluster by using:
hawq stop/restart/start cluster

# HAWQ master and segments are completely decoupled. So you can also init, start or stop the master and segments separately.
# For example, to init: hawq init master, then hawq init segment
# to stop: hawq stop master, then hawq stop segment
# to start: hawq start master, then hawq start segment
1. 添加 greenplum_path.sh 到系统环境变量。
2. 文档没有提及,但是初始化应该是需要配置集群的。暂未补上。
3. HAWQ 的 master 和 segment 是独立开的。可以单独对 master 和 segment 进行开关操作。

Connect and Run basic queries 连接和运行

后续步骤尚未确定,等待实际环境部署尝试后更新

<– To Be Continued

NYN-Yaah // Code Nine

..-. . -.