What is HAWQ

2018-05-07

最近需要学习一下 HAWQ,正好渣翻一下这个正在孵化的项目文档。

What is HAWQ

HAWQ is a Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop. HAWQ reads data from and writes data to HDFS natively.

- HAWQ 是一个 Hadoop 原生 SQL 查询引擎,整合了 MPP 数据库的技术优点和 Hadoop 的方便性。HAWQ 从 HDFS 读取和写入数据。

HAWQ delivers industry-leading performance and linear scalability. It provides users the tools to confidently and successfully interact with petabyte range data sets. HAWQ provides users with a complete, standards compliant SQL interface. More specifically, HAWQ has the following features:

- HAWQ 拥有业界领先的性能和现行扩展能力。它为使用者提供了可以放心、成功地和海量数据集交互的工具。HAWQ 为使用者提供了完全的、标准的 SQL 接口支持。更具体的,HAWQ 有以下特点:

On-premise or cloud deployment

- 内部部署或云部署

Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP extension

- 强健的 ANSI SQL 支持:SQL-92,SQL-99,SQL-2003,OLAP 扩展

Extremely high performance- many times faster than other Hadoop SQL engines

- 极高的性能,数倍于其他Hadoop SQL 引擎

World-class parallel optimizer

- 世界级的并行优化器

Full transaction capability and consistency guarantee: ACID

- 完全事务支持和持久化保障:ACID

Dynamic data flow engine through high speed UDP based interconnect

- 基于内联UDP网络的高速动态数据流引擎

Elastic execution engine based on on-demand virtual segments and data locality

- 基于按需分配虚拟segment和数据局部性的全文执行引擎

Support multiple level partitioning and List/Range based partitioned tables.

- 支持多级分区和基于 List/Range 的分区表

Multiple compression method support: snappy, gzip

- 支持多种压缩方式:snappy,gzip

Multi-language user defined function support: Python, Perl, Java, C/C++, R

- 多语言用户自定义函数支持:Python,Perl,Java,C/C++,R

Advanced machine learning and data mining functionalities through MADLib

- MADLib提供的先进机器学习和数据挖掘方法

Dynamic node expansion: in seconds

- 动态节点扩张:数秒内完成

Most advanced three level resource management: Integrate with YARN and hierarchical resource queues.

- 最先进的三级资源管理:整合YARN和分层资源队列

Easy access of all HDFS data and external system data (for example, HBase)

- 无障碍访问所有HDFS数据和外部系统数据(例如 HBase)

Hadoop Native: from storage (HDFS), resource management (YARN) to deployment (Ambari).

- 原生Hadoop:从存储(HDFS),资源管理(YARN)到部署(Ambari)

Authentication & granular authorization: Kerberos, SSL and role based access

- 认证和简单授权:Kerberos,SSL和基于角色访问

Advanced C/C++ access library to HDFS and YARN: libhdfs3 and libYARN

- 先进的 C/C++ HDFS和YARN访问库:libhdfs3 和 libYARN

Support for most third party tools: Tableau, SAS et al.

- 支持大部分第三方工具:Tableau 和 SAS

Standard connectivity: JDBC/ODBC
标准连接接口:JDBC/ODBC

HAWQ breaks complex queries into small tasks and distributes them to MPP query processing units for execution.

- HAWQ 将复杂的查询分成多个小任务并分配给 MPP 查询处理单元执行。

HAWQ’s basic unit of parallelism is the segment instance. Multiple segment instances on commodity servers work together to form a single parallel query processing system. A query submitted to HAWQ is optimized, broken into smaller components, and dispatched to segments that work together to deliver a single result set. All relational operations - such as table scans, joins, aggregations, and sorts - simultaneously execute in parallel across the segments. Data from upstream components in the dynamic pipeline are transmitted to downstream components through the scalable User Datagram Protocol (UDP) interconnect.

- HAWQ 的基本并行单元是 segment 实例。多个商用服务器上协同工作的 segment 实例组成了一个并行查询处理系统。被提交到 HAWQ 的语句会被优化,分成小的组件,打包发送至多个协同工作来交付同一个结果集的 segment 上。相关的操作,例如表的扫描、连接、聚合以及排列,同时在多个节点上并行执行。从上游动态管线组件来的数据通过可伸缩的 UDP 内联网络被发送到下游组件。

Based on Hadoop’s distributed storage, HAWQ has no single point of failure and supports fully-automatic online recovery. System states are continuously monitored, therefore if a segment fails, it is automatically removed from the cluster. During this process, the system continues serving customer queries, and the segments can be added back to the system when necessary.

- 基于 Hadoop 的分布式存储,HAWQ 不存在单点失败,并且支持完全自动的在线恢复。系统状态处于持续监控,如果一个节点失败了,它将自动从集群中恢复。在这个过程中,HAWQ 系统将会继续处理用户查询,节点在需要时被添加回系统。

HAWQ Architecture

In a typical HAWQ deployment, each slave node has one physical HAWQ segment, an HDFS DataNode and a NodeManager installed. Masters for HAWQ, HDFS and YARN are hosted on separate nodes.

- 典型的 HAWQ 部署中,每个 slave 节点有一个物理 HAWQ segment,一个HDFS 数据节点,以及一个 节点管理器。HAWQ 的 master,HDFS 和 YARN 被放在不同的节点上。

The following diagram provides a high-level architectural view of a typical HAWQ deployment.

- 下图展示了一个典型 HAWQ 部署的高层结构图。

HAWQ architectural

HAWQ is tightly integrated with YARN, the Hadoop resource management framework, for query resource management. HAWQ caches containers from YARN in a resource pool and then manages those resources locally by leveraging HAWQ’s own finer-grained resource management for users and groups. To execute a query, HAWQ allocates a set of virtual segments according to the cost of a query, resource queue definitions, data locality and the current resource usage in the system. Then the query is dispatched to corresponding physical hosts, which can be a subset of nodes or the whole cluster. The HAWQ resource enforcer on each node monitors and controls the real time resources used by the query to avoid resource usage violations.

- HAWQ 高度集成了 YARN,一个 Hadoop 资源管理框架,来负责管理查询资源。HAWQ 从 YARN 资源池中获取一个容器然后在本地通过 HAWQ 自己优化的资源管理器来为用户和小组管理资源。执行查询时,HAWQ 根据查询的需要的代价、资源队列定义、数据局部性以及当前系统的资源使用情况分配一组虚拟 segment。接下来查询将被分发至相应的物理节点(可以包括集群的子集或者整个集群)。每个节点上的 HAWQ 资源执行器监控和控制查询所用的资源来避免违规使用资源。

The following diagram provides another view of the software components that constitute HAWQ.

- 下图展示了 HAWQ 结构的软件组成。

HAWQ software components

HAWQ Master

The HAWQ master is the entry point to the system. It is the database process that accepts client connections and processes the SQL commands issued. The HAWQ master parses queries, optimizes queries, dispatches queries to segments and coordinates the query execution.

- HAWQ master 是系统的入口。它是接收客户连接和处理 SQL 命令的数据库。HAWQ master 处理查询,优化查询,分发查询到 segments 协同完成查询执行。

End-users interact with HAWQ through the master and can connect to the database using client programs such as psql or application programming interfaces (APIs) such as JDBC or ODBC.

- 终端用户通过 master 和 HAWQ 交互,可以通过 psql 或者应用程序接口(例如 JDBC 或 ODBC)来连接数据库。

The master is where the global system catalog resides. The global system catalog is the set of system tables that contain metadata about the HAWQ system itself. The master does not contain any user data; data resides only on HDFS. The master authenticates client connections, processes incoming SQL commands, distributes workload among segments, coordinates the results returned by each segment, and presents the final results to the client program.

- master 拥有全局系统表。全局系统表由一系列记录 HAWQ 系统元数据的系统表组成。master 不具有任何用户数据;数据只分布在 HDFS 上。master 验证客户连接,处理 SQL 命令,在 segment 间分配工作负载,

HAWQ Segment

In HAWQ, the segments are the units that process data simultaneously.

- HAWQ segment 是并行处理数据的单元。

There is only one physical segment on each host. Each segment can start many Query Executors (QEs) for each query slice. This makes a single segment act like multiple virtual segments, which enables HAWQ to better utilize all available resources.

- 每个主机上只有一个物理 segment。每个 segment 为每个查询片启动多个 查询执行器(QE)。这使得一个 segment 可以表现得像有多个虚拟 segment,以令 HAWQ 能更好地利用可用的资源。

A virtual segment behaves like a container for QEs. Each virtual segment has one QE for each slice of a query. The number of virtual segments used determines the degree of parallelism (DOP) of a query.

- 虚拟 segment 是 QE 得容器。每个虚拟 segment 有一个 QE 处理一个查询片。虚拟 segment 得数量由查询的并行度决定。

A segment differs from a master because it:

- segment 和 master 得不同主要体现在:

Is stateless.

- 无状态

Does not store the metadata for each database and table.

- 不存储数据库或表存储元数据

Does not store data on the local file system.

- 不在本地文件系统存储数据

The master dispatches the SQL request to the segments along with the related metadata information to process. The metadata contains the HDFS url for the required table. The segment accesses the corresponding data using this URL.

- master 分发 SQL 请求以及要处理的相关元数据到 segment 上。元数据包含需要用到的表的 HDFS URL 。segment 通过这个 URL 访问相应的数据。

HAWQ Interconnect

The interconnect is the networking layer of HAWQ. When a user connects to a database and issues a query, processes are created on each segment to handle the query. The interconnect refers to the inter-process communication between the segments, as well as the network infrastructure on which this communication relies. The interconnect uses standard Ethernet switching fabric.

- 内联网络是 HAWQ 得网络层。当用户连接到数据库并进行查询,进程在每个 segment 上被创建来处理这个查询。内联网络代表 segment 得进程间通信也是这个通信所依赖的网络基础。内联网络使用标准的以太网交换结构。

By default, the interconnect uses UDP (User Datagram Protocol) to send messages over the network. The HAWQ software performs the additional packet verification beyond what is provided by UDP. This means the reliability is equivalent to Transmission Control Protocol (TCP), and the performance and scalability exceeds that of TCP. If the interconnect used TCP, HAWQ would have a scalability limit of 1000 segment instances. With UDP as the current default protocol for the interconnect, this limit is not applicable.

- 内联网络默认使用 UDP 在网络上发送信息。HAWQ 提供了除了 UDP 之外的额外的包检验。这意味着它的可靠性等同于 TCP,性能和扩展性超过 TCP。如果内联网络使用 TCP,HAWQ 最多扩展至 1000个 segment 实例。内联网络默认使用 UDP 时没有这个限制。

HAWQ Resource Manager

The HAWQ resource manager obtains resources from YARN and responds to resource requests. Resources are buffered by the HAWQ resource manager to support low latency queries. The HAWQ resource manager can also run in standalone mode. In these deployments, HAWQ manages resources by itself without YARN.

- HAWQ 资源管理器从 YARN 获取资源并相应资源请求。HAWQ 资源管理器缓存资源来支持低延迟查询。HAWQ 资源管理器也能以独立模式运行。在这种部署下,HAWQ 资源管理器自己分配资源而不是通过 YARN。

See How HAWQ Manages Resources for more details on HAWQ resource management.

HAWQ Catalog Service

The HAWQ catalog service stores all metadata, such as UDF/UDT information, relation information, security information and data file locations.

- HAWQ 目录服务保存了所有的元数据,例如 UDF/UDT 信息,关系信息,安全信息和数据文件位置。

HAWQ Fault Tolerance Service

The HAWQ fault tolerance service (FTS) is responsible for detecting segment failures and accepting heartbeats from segments.

HAWQ 默认的错误容忍服务(FTS)负责检测 segment 的故障和接收 segment 的监测图。

See Understanding the Fault Tolerance Service for more information on this service.

HAWQ Dispatcher

The HAWQ dispatcher dispatches query plans to a selected subset of segments and coordinates the execution of the query. The dispatcher and the HAWQ resource manager are the main components responsible for the dynamic scheduling of queries and the resources required to execute them.

- HAWQ 分发器分发查询计划给选定的 segment 子集并整合查询结果。分发器和 HAWQ 资源管理器是负责动态规划查询和执行查询的资源的主要组件。

Table Distribution and Storage

HAWQ stores all table data, except the system table, in HDFS. When a user creates a table, the metadata is stored on the master’s local file system and the table content is stored in HDFS.

- HAWQ 在 HDFS 上存储除了系统表外的所有表数据。当用户创建一个表,元数据被存储在 master 的本地文件系统,表内容存储在 HDFS 上。

In order to simplify table data management, all the data of one relation are saved under one HDFS folder.

- 为了简化表数据管理,所有属于同一关系的数据存放在相同 HDFS 文件夹下。

For all HAWQ table storage formats, AO (Append-Only) and Parquet, the data files are splittable, so that HAWQ can assign multiple virtual segments to consume one data file concurrently. This increases the degree of query parallelism.

- 对于所有的 HAWQ 表存储格式,AO(只扩展)和 Parquet,数据文件是可分割的,因此 HAWQ 可以同时分配多个虚拟 segment 来使用同一个数据文件。这提升了查询并行度。

Table Distribution Policy

The default table distribution policy in HAWQ is random.

- HAWQ 默认的表分布策略是随机的。

Randomly distributed tables have some benefits over hash distributed tables. For example, after cluster expansion, HAWQ can use more resources automatically without redistributing the data. For huge tables, redistribution is very expensive, and data locality for randomly distributed tables is better after the underlying HDFS redistributes its data during rebalance or DataNode failures. This is quite common when the cluster is large.

- 随机分布相遇对 Hash 分布有一些优点。 例如,在集群扩张后,HAWQ 可以自动地使用更多资源而不用重分布数据。对于大表,重分布的代价高昂,在 HDFS 通过再平衡或者数据节点故障进行数据重分布使数据局部性的表随机分布显得更优秀。在较大集群中,这样做比较常见。

On the other hand, for some queries, hash distributed tables are faster than randomly distributed tables. For example, hash distributed tables have some performance benefits for some TPC-H queries. You should choose the distribution policy that is best suited for your application’s scenario.

- 另一方面,对于某一些查询,Hash 分布比随机分布更快。例如,对于 TPC-H 查询,Hash 分布有一些性能上的优势。用户应该选择最适合自己应用场景的分布策略。

See Choosing the Table Distribution Policy for more details.

Data Locality

Data is distributed across HDFS DataNodes. Since remote read involves network I/O, a data locality algorithm improves the local read ratio. HAWQ considers three aspects when allocating data blocks to virtual segments:

- 数据分布于 HDFS 数据节点上。远程读取数据需要涉及网络 I/O,因此数据局部性算法能够提升本地读取效率。HAWQ 依据以下三点分配数据块给虚拟 segment:
  1. Ratio of local read
    • 本地读取率
  2. Continuity of file read
    • 文件读取连续性
  3. Data balance among virtual segments
    • 虚拟 segment 间的数据平衡

External Data Access

HAWQ can access data in external files using the HAWQ Extension Framework (PXF). PXF is an extensible framework that allows HAWQ to access data in external sources as readable or writable HAWQ tables. PXF has built-in connectors for accessing data inside HDFS files, Hive tables, and HBase tables. PXF also integrates with HCatalog to query Hive tables directly. See Using PXF with Unmanaged Data for more details.

- HAWQ 可以通过使用 HAWQ 扩展框架(PXF)来访问外部文件。PXF 使 HAWQ 能够以可读或可写 HAWQ 表访问外部资源上的数据。PXF 具有内置的连接器用来访问 HDFS 文件,Hive 表以及 HBase 表。PXF 也整合了 HCatalog 来直接查询 Hive 表。

Users can create custom PXF connectors to access other parallel data stores or processing engines. Connectors are Java plug-ins that use the PXF API. For more information see PXF External Tables and API.

- 用户可以创建自定义 PXF 连接起来访问其他并行数据存储仓库或处理引擎。连接器是使用 PXF 接口的 Java 插件。

Elastic Query Execution Runtime

HAWQ uses dynamically allocated virtual segments to provide resources for query execution.

- HAWQ 使用动态分配的虚拟 segment 来为查询执行提供资源。

In HAWQ 1.x, the number of segments (compute resource carrier) used to run a query is fixed, no matter whether the underlying query is big query requiring many resources or a small query requiring little resources. This architecture is simple, however it uses resources inefficiently.

- 在 HAWQ 1.x 版本中,用于执行查询的 segment 数量(计算资源载体)是固定的,无论执行的查询需要大量资源还是少量资源。这个结构很简单,但是不能有效地使用资源。

To address this issue, HAWQ now uses the elastic query execution runtime feature, which is based on virtual segments. HAWQ allocates virtual segments on demand based on the costs of queries. In other words, for big queries, HAWQ starts a large number of virtual segments, while for small queries HAWQ starts fewer virtual segments.

- 为了解决这个问题,HAWQ 现在使用基于虚拟 segment 的运行时全文查询执行功能。HAWQ 根据查询所需的花费来决定分配的虚拟 segment。换句话说,对于大查询,HAWQ 启动较多的虚拟 segment,对于晓得查询 HAWQ 启动较少的虚拟 segment。

Storage

In HAWQ, the number of invoked segments varies based on cost of query. In order to simplify table data management, all data of one relation are saved under one HDFS folder.

- HAWQ 上依照查询花费启用不同数量的 segment。为了简化数据管理,所有属于同一个关系的数据保存在同一个 HDFS 文件夹下。

For all the HAWQ table storage formats, AO (Append-Only) and Parquet, the data files are splittable, so that HAWQ can assign multiple virtual segments to consume one data file concurrently to increase the parallelism of a query.

- 对于所有的 HAWQ 表存储格式,AO 和 Parquet,数据文件是可分的。HAWQ可以分配多个虚拟节点来同时使用一个数据,以增加查询并行性。

Physical Segments and Virtual Segments

In HAWQ, only one physical segment needs to be installed on one host, in which multiple virtual segments can be started to run queries. HAWQ allocates multiple virtual segments distributed across different hosts on demand to run one query. Virtual segments are carriers (containers) for resources such as memory and CPU. Queries are executed by query executors in virtual segments.

- 在 HAWQ 中,一个节点上只会安装一个物理 segment,但是可以启用多个虚拟 segment 来执行查询。HAWQ 在不同的主机上分配多个虚拟 segment 来执行一个查询。虚拟 segment 是资源(例如内存和 CPU )的载体。查询由虚拟 segment 上的查询执行器执行。

Virtual Segment Allocation Policy

Different number of virtual segments are allocated based on virtual segment allocation policies. The following factors determine the number of virtual segments that are used for a query:

- 根据虚拟 segment 分配策略会分配不同数量的虚拟 segment。以下因素决定用户查询的虚拟 segment 数量:
  1. Resources available at the query running time
    • 查询运行时可用的资源量
  2. The cost of the query
    • 查询的代价
  3. The distribution of the table; in other words, randomly distributed tables and hash distributed tables
    • 表的分布;换句话说,随机表分布和Hash表分布
  4. Whether the query involves UDFs and external tables
    • 当查询包含了 UDFs 和外部表
  5. Specific server configuration parameters, such as default_hash_table_bucket_number for hash table queries and hawq_rm_nvseg_perquery_limit
    • 自定义服务器配置参数,例如 Hash 表查询 default_hash_table_bucket_numberhawq_rm_nvseg_perquery_limit

Resource Management

HAWQ provides several approaches to resource management and includes several user-configurable options, including integration with YARN’s resource management.

- HAWQ 提供了多种资源管理的方法并打包了多个用户可配置选项,包括集成 YARN 资源管理器。

HAWQ has the ability to manage resources by using the following mechanisms:

- HAWQ 可以通过以下功能管理资源:
  1. Global resource management. You can integrate HAWQ with the YARN resource manager to request or return resources as needed. If you do not integrate HAWQ with YARN, HAWQ exclusively consumes cluster resources and manages its own resources. If you integrate HAWQ with YARN, then HAWQ automatically fetches resources from YARN and manages those obtained resources through its internally defined resource queues. Resources are returned to YARN automatically when resources are not used anymore.
    • 全局资源管理。将 YARN 资源管理器整合进 HAWQ 来请求或返还资源。如果不将 YARN 与 HAWQ 整合,HAWQ 将只使用集群资源并只管理自己的资源。如果将 YARN 整合进 HAWQ,HAWQ 能自动地从 YARN 获取资源,并管理通过与定义的资源队列管理获取的资源。不再使用资源的时候,资源将被自动返还给 YARN。
  2. User defined hierarchical resource queues. HAWQ administrators or superusers design and define the resource queues used to organize the distribution of resources to queries.
    • 用户自定义多层级资源队列。HAWQ 管理员或者超级用户设计和定义用于组织查询资源分布的资源队列。
  3. Dynamic resource allocation at query runtime. HAWQ dynamically allocates resources based on resource queue definitions. HAWQ automatically distributes resources based on running (or queued) queries and resource queue capacities.
    • 运行时动态资源分配。HAWQ 根据资源队列的定义,动态地分配资源。HAWQ 根据正在运行的(或队列中的)查询以及可用的资源队列,自动地分配资源。
  4. Resource limitations on virtual segments and queries. You can configure HAWQ to enforce limits on CPU and memory usage both for virtual segments and the resource queues used by queries.
    • 虚拟 segment 和查询的资源限制。用户可以配置 HAWQ 为虚拟 segment 和被查询使用的资源队列添加 CPU 和 内存使用限制。

For more details on resource management in HAWQ and how it works, see Managing Resources.

HDFS Catalog Cache

HDFS catalog cache is a caching service used by HAWQ master to determine the distribution information of table data on HDFS.

- HDFS 记录缓存用于缓存 HAWQ master 使用的服务,以确定 HDFS 上表数据的分布信息。

HDFS is slow at RPC handling, especially when the number of concurrent requests is high. In order to decide which segments handle which part of data, HAWQ needs data location information from HDFS NameNodes. HDFS catalog cache is used to cache the data location information and accelerate HDFS RPCs.

- HDFS 处理 RPC 较慢,尤其是并发请求很高时。为了确定节点负责的数据部分,HAWQ 需要从 HDFS NameNode 获取数据定位信息。HDFS 目录缓存就是用来缓存数据定位信息和加速 HDFS RPC 的。

High Availability, Redundancy and Fault Tolerance

HAWQ ensures high availability for its clusters through system redundancy. HAWQ deployments utilize platform hardware redundancy, such as RAID for the master catalog, JBOD for segments and network redundancy for its interconnect layer. On the software level, HAWQ provides redundancy via master mirroring and dual cluster maintenance. In addition, HAWQ supports high availability NameNode configuration within HDFS.

- HAWQ 通过系统冗余保证了集群的高可用性。HAWQ 部署利用了平台硬件冗余,例如为 master的记录使用 RAID。JBOD 作为内联网络层,为 segment 和网络提供冗余。在软件层面,HAWQ 通过 master 镜像以及双重集群维护。此外,HAWQ 支持 HDFS 内配置的高可用的 NameNode。

To maintain cluster health, HAWQ uses a fault tolerance service based on heartbeats and on-demand probe protocols. It can identify newly added nodes dynamically and remove nodes from the cluster when it becomes unusable.

- 保持集群的健康,HAWQ 使用了基于监测的错误容忍服务和按需探测原则。它可以动态地识别新添加的节点和从集群中移除不可用的节点。

About High Availability

HAWQ employs several mechanisms for ensuring high availability. The foremost mechanisms are specific to HAWQ and include the following:

- HAWQ 使用了多种结构来保证高可用性。重要的结构都是为 HAWQ 定制的,包括以下几点:
  1. Master mirroring. Clusters have a standby master in the event of failure of the primary master.
    • master 镜像。集群有一个后备 master 以防备主 master 故障。
  2. Dual clusters. Administrators can create a secondary cluster and synchronizes its data with the primary cluster either through dual ETL or backup and restore mechanisms.
    • 双集群。管理员可以创建一个副集群然后通过双重 ETL 或者备份和保存结构来同步主集群的数据。

In addition to high availability managed on the HAWQ level, you can enable high availability in HDFS for HAWQ by implementing the high availability feature for NameNodes. See HAWQ Filespaces and High Availability Enabled HDFS.

- HAWQ 管理的额外高可用性,可以通过实现 NameNode 的高可用性特性来获得 HDFS 的高可用性。

About Segment Fault Tolerance

In HAWQ, the segments are stateless. This ensures faster recovery and better availability.

- HAWQ segment 是无状态的。者保证了快速恢复和更高的可用性。

When a segment fails, the segment is removed from the resource pool. Queries are no longer dispatched to the segment. When the segment is operational again, the Fault Tolerance Service verifies its state and adds the segment back to the resource pool.

- segment 故障时,节点会从资源池中移除。查询将不会被分发到这个 segment。当 segment 恢复可用时,错误容忍服务检验它的状态,然后将 segment 添加回资源池。

About Interconnect Redundancy

The interconnect refers to the inter-process communication between the segments and the network infrastructure on which this communication relies. You can achieve a highly available interconnect by deploying dual Gigabit Ethernet switches on your network and deploying redundant Gigabit connections to the HAWQ host (master and segment) servers.

- 内联网络指的是节点间进程通信和这个通信所依赖的网络。用户可以通过部署在网络上双重千兆以太网交换机和为 HAWQ 主机(master 和 segment)部署冗余千兆连接,以获得高可用性内联网络。

In order to use multiple NICs in HAWQ, NIC bounding is required.

需要 NIC 绑定在 HAWQ 上使用多重 NIC。

Source

HAWQ System Overview

(。・`ω´・)

..-. . -.