24 Sep 2018
cloudera是一家hadoop的商业公司,提供hadoop的商业产品CDH(Cloudera Distribution including Apache Hadoop)。同时cloudera也是apache软件基金会的赞助商。
参考文档: (CDH安装要求说明官方文档)[https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_requirements_supported_versions.html]
/dev/sdb1 /data1 ext4 defaults,noatime 0, 增加noatime选项,具体磁盘和挂载点按照实际情况来mount -o remount /data1, 挂载点按照实际情况来/dev/sdb1 /data1 ext4 defaults,noatime,nosync 0CM和CDH自带了一个内嵌的PostgreSQL数据库,仅用于非生产环境,生产环境下,需要配置使用一个外部使用的数据库。
注意点:
当重启进程时,各个服务都会在CM连接的数据库中读取配置,如果配置无法读取,大数据集群将不会正确的启动。所以必须要提前做好数据库的备份,这方面内容可以参照(数据库备份)[https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_ag_backup_dbs.html#xd_583c10bfdbd326ba–6eed2fb8-14349d04bee–7e98]
数据库支持列表:
CDH 6.X 仅支持oracle JDK8(不支持openjdk),以下是详细列表:
CDH和CM的TLS版本支持表
| Component | Role | Name | Port | Version |
|---|---|---|---|---|
| Cloudera Manager | Cloudera Manager Server | 7182 | TLS 1.2 | |
| Cloudera Manager | Cloudera Manager Server | 7183 | TLS 1.2 | |
| Flume | 9099 | TLS 1.2 | ||
| Flume | Avro Source/Sink | TLS 1.2 | ||
| Flume | Flume HTTP Source/Sink | TLS 1.2 | ||
| HBase | Master | HBase Master Web UI Port | 60010 | TLS 1.2 |
| HDFS | NameNode | Secure NameNode Web UI Port | 50470 | TLS 1.2 |
| HDFS | Secondary NameNode | Secure Secondary NameNode Web UI Port | 50495 | TLS 1.2 |
| HDFS | HttpFS | REST Port | 14000 | TLS 1.1, TLS 1.2 |
| Hive | HiveServer2 | HiveServer2 Port | 10000 | TLS 1.2 |
| Hue | Hue Server | Hue HTTP Port | 8888 | TLS 1.2 |
| Impala | Impala Daemon | Impala Daemon Beeswax Port | 21000 | TLS 1.2 |
| Impala | Impala Daemon | Impala Daemon HiveServer2 Port | 21050 | TLS 1.2 |
| Impala | Impala Daemon | Impala Daemon Backend Port | 22000 | TLS 1.2 |
| Impala | Impala StateStore | StateStore Service Port | 24000 | TLS 1.2 |
| Impala | Impala Daemon | Impala Daemon HTTP Server Port | 25000 | TLS 1.2 |
| Impala | Impala StateStore | StateStore HTTP Server Port | 25010 | TLS 1.2 |
| Impala | Impala Catalog Server | Catalog Server HTTP Server Port | 25020 | TLS 1.2 |
| Impala | Impala Catalog Server | Catalog Server Service Port | 26000 | TLS 1.2 |
| Oozie | Oozie Server | Oozie HTTPS Port | 11443 | TLS 1.1, TLS 1.2 |
| Solr | Solr Server | Solr HTTP Port | 8983 | TLS 1.1, TLS 1.2 |
| Solr | Solr Server | Solr HTTPS Port | 8985 | TLS 1.1, TLS 1.2 |
| Spark | History Server | 18080 | TLS 1.2 | |
| YARN | ResourceManager | ResourceManager Web Application HTTP Port | 8090 | TLS 1.2 |
| YARN | JobHistory Server | MRv1 JobHistory Web Application HTTP Port | 19890 | TLS 1.2 |
网络和安全要求
/etc/sysconfig/network文件必须配置正确的hostname| Component (Version) | Unix User ID | Groups | Functionality |
|---|---|---|---|
| Cloudera Manager (all versions) | cloudera-scm | cloudera-scm | Clusters managed by Cloudera Manager run Cloudera Manager Server, monitoring roles, and other Cloudera Server processes as cloudera-scm. Requires keytab file named cmf.keytabbecause name is hard-coded in Cloudera Manager. |
| Apache Accumulo | accumulo | accumulo | Accumulo processes run as this user. |
| Apache Flume | flume | flume | The sink that writes to HDFS as user must have write privileges. |
| Apache HBase | hbase | hbase | The Master and the RegionServer processes run as this user. |
| HDFS | hdfs | hdfs, hadoop | The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it. |
| Apache Hive | hive | hive | The HiveServer2 process and the Hive Metastore processes run as this user. A user must be defined for Hive access to its Metastore DB (for example, MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This is javax.jdo.option.ConnectionUserName in hive-site.xml. |
| Apache HCatalog | hive | hive | The WebHCat service (for REST access to Hive functionality) runs as the hive user. |
| HttpFS | httpfs | httpfs | The HttpFS service runs as this user. See HttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file. |
| Hue | hue | hue | Hue services run as this user. |
| Hue Load Balancer | apache | apache | The Hue Load balancer has a dependency on the apache2 package that uses the apacheuser name. Cloudera Manager does not run processes using this user ID. |
| Impala | impala | impala, hive | Impala services run as this user. |
| Apache Kafka | kafka | kafka | Kafka services run as this user. |
| Java KeyStore KMS | kms | kms | The Java KeyStore KMS service runs as this user. |
| Key Trustee KMS | kms | kms | The Key Trustee KMS service runs as this user. |
| Key Trustee Server | keytrustee | keytrustee | The Key Trustee Server service runs as this user. |
| Kudu | kudu | kudu | Kudu services run as this user. |
| MapReduce | mapred | mapred, hadoop | Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos. |
| Apache Oozie | oozie | oozie | The Oozie service runs as this user. |
| Parquet | ~ | ~ | No special users. |
| Apache Pig | ~ | ~ | No special users. |
| Cloudera Search | solr | solr | The Solr processes run as this user. |
| Apache Spark | spark | spark | The Spark History Server process runs as this user. |
| Apache Sentry | sentry | sentry | The Sentry service runs as this user. |
| Apache Sqoop | sqoop | sqoop | This user is only for the Sqoop1 Metastore, a configuration option that is not recommended. |
| YARN | yarn | yarn, hadoop | Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos. |
(Cloudera Enterprise Reference Architecture for Bare Metal Deployments (PDF))[http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_metal.pdf]
CM和CDH使用oracle jdk,不支持openjdk。
cloudera推荐在生产环境中部署三到四种机器类型: