cloudera 1.1.0 简介和安装要求说明



1. 关于cloudera,CDH和CM

cloudera是一家hadoop的商业公司,提供hadoop的商业产品CDH(Cloudera Distribution including Apache Hadoop)。同时cloudera也是apache软件基金会的赞助商。

2. CDH6.X安装要求和支持的版本

参考文档: (CDH安装要求说明官方文档)[https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_requirements_supported_versions.html]

1) 操作系统要求

2) 数据库要求

CM和CDH自带了一个内嵌的PostgreSQL数据库,仅用于非生产环境,生产环境下,需要配置使用一个外部使用的数据库。

注意点:

当重启进程时,各个服务都会在CM连接的数据库中读取配置,如果配置无法读取,大数据集群将不会正确的启动。所以必须要提前做好数据库的备份,这方面内容可以参照(数据库备份)[https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_ag_backup_dbs.html#xd_583c10bfdbd326ba–6eed2fb8-14349d04bee–7e98]

数据库支持列表:

3) JAVA

CDH 6.X 仅支持oracle JDK8(不支持openjdk),以下是详细列表:

4) 网络和安全要求

CDH和CM的TLS版本支持表

Component Role Name Port Version
Cloudera Manager Cloudera Manager Server 7182 TLS 1.2
Cloudera Manager Cloudera Manager Server 7183 TLS 1.2
Flume 9099 TLS 1.2
Flume Avro Source/Sink TLS 1.2
Flume Flume HTTP Source/Sink TLS 1.2
HBase Master HBase Master Web UI Port 60010 TLS 1.2
HDFS NameNode Secure NameNode Web UI Port 50470 TLS 1.2
HDFS Secondary NameNode Secure Secondary NameNode Web UI Port 50495 TLS 1.2
HDFS HttpFS REST Port 14000 TLS 1.1, TLS 1.2
Hive HiveServer2 HiveServer2 Port 10000 TLS 1.2
Hue Hue Server Hue HTTP Port 8888 TLS 1.2
Impala Impala Daemon Impala Daemon Beeswax Port 21000 TLS 1.2
Impala Impala Daemon Impala Daemon HiveServer2 Port 21050 TLS 1.2
Impala Impala Daemon Impala Daemon Backend Port 22000 TLS 1.2
Impala Impala StateStore StateStore Service Port 24000 TLS 1.2
Impala Impala Daemon Impala Daemon HTTP Server Port 25000 TLS 1.2
Impala Impala StateStore StateStore HTTP Server Port 25010 TLS 1.2
Impala Impala Catalog Server Catalog Server HTTP Server Port 25020 TLS 1.2
Impala Impala Catalog Server Catalog Server Service Port 26000 TLS 1.2
Oozie Oozie Server Oozie HTTPS Port 11443 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTP Port 8983 TLS 1.1, TLS 1.2
Solr Solr Server Solr HTTPS Port 8985 TLS 1.1, TLS 1.2
Spark History Server 18080 TLS 1.2
YARN ResourceManager ResourceManager Web Application HTTP Port 8090 TLS 1.2
YARN JobHistory Server MRv1 JobHistory Web Application HTTP Port 19890 TLS 1.2

网络和安全要求

Component (Version) Unix User ID Groups Functionality
Cloudera Manager (all versions) cloudera-scm cloudera-scm Clusters managed by Cloudera Manager run Cloudera Manager Server, monitoring roles, and other Cloudera Server processes as cloudera-scm.
Requires keytab file named cmf.keytabbecause name is hard-coded in Cloudera Manager.
Apache Accumulo accumulo accumulo Accumulo processes run as this user.
Apache Flume flume flume The sink that writes to HDFS as user must have write privileges.
Apache HBase hbase hbase The Master and the RegionServer processes run as this user.
HDFS hdfs hdfs, hadoop The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Apache Hive hive hive The HiveServer2 process and the Hive Metastore processes run as this user.
A user must be defined for Hive access to its Metastore DB (for example, MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml.
Apache HCatalog hive hive The WebHCat service (for REST access to Hive functionality) runs as the hive user.
HttpFS httpfs httpfs The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.
Hue hue hue Hue services run as this user.
Hue Load Balancer apache apache The Hue Load balancer has a dependency on the apache2 package that uses the apacheuser name. Cloudera Manager does not run processes using this user ID.
Impala impala impala, hive Impala services run as this user.
Apache Kafka kafka kafka Kafka services run as this user.
Java KeyStore KMS kms kms The Java KeyStore KMS service runs as this user.
Key Trustee KMS kms kms The Key Trustee KMS service runs as this user.
Key Trustee Server keytrustee keytrustee The Key Trustee Server service runs as this user.
Kudu kudu kudu Kudu services run as this user.
MapReduce mapred mapred, hadoop Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.
Apache Oozie oozie oozie The Oozie service runs as this user.
Parquet ~ ~ No special users.
Apache Pig ~ ~ No special users.
Cloudera Search solr solr The Solr processes run as this user.
Apache Spark spark spark The Spark History Server process runs as this user.
Apache Sentry sentry sentry The Sentry service runs as this user.
Apache Sqoop sqoop sqoop This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
YARN yarn yarn, hadoop Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.

3. 裸机安装指南

(Cloudera Enterprise Reference Architecture for Bare Metal Deployments (PDF))[http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_metal.pdf]

1) java

CM和CDH使用oracle jdk,不支持openjdk。

2) 适当大小的服务器配置

cloudera推荐在生产环境中部署三到四种机器类型:

4. 扩展阅读

before you install