Spark进阶之路-日志服务器的配置
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
如果你还在纠结如果配置Spark独立模式(Standalone)集群,可以参考我之前分享的笔记:https://www.cnblogs.com/yinzhengjie/p/9379045.html 。然而本篇博客的重点是如何配置日志服务器,并将日志落地在hdfs上。
一.准备实验环境
1>.集群管理脚本
[yinzhengjie@s101 ~]$ more `which xcall.sh`#!/bin/bash#@author :yinzhengjie#blog:http://www.cnblogs.com/yinzhengjie#EMAIL:y1053419035@qq.com#判断用户是否传参if [ $# -lt 1 ];then echo "请输入参数" exitfi#获取用户输入的命令cmd=$@for (( i=101;i<=105;i++ ))do #使终端变绿色 tput setaf 2 echo ============= s$i $cmd ============ #使终端变回原来的颜色,即白灰色 tput setaf 7 #远程执行命令 ssh s$i $cmd #判断命令是否执行成功 if [ $? == 0 ];then echo "命令执行成功" fidone[yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ more `which xrsync.sh`#!/bin/bash#@author :yinzhengjie#blog:http://www.cnblogs.com/yinzhengjie#EMAIL:y1053419035@qq.com#判断用户是否传参if [ $# -lt 1 ];then echo "请输入参数"; exitfi#获取文件路径file=$@#获取子路径filename=`basename $file`#获取父路径dirpath=`dirname $file`#获取完整路径cd $dirpathfullpath=`pwd -P`#同步文件到DataNodefor (( i=102;i<=105;i++ ))do #使终端变绿色 tput setaf 2 echo =========== s$i %file =========== #使终端变回原来的颜色,即白灰色 tput setaf 7 #远程执行命令 rsync -lr $filename `whoami`@s$i:$fullpath #判断命令是否执行成功 if [ $? == 0 ];then echo "命令执行成功" fidone[yinzhengjie@s101 ~]$
2>.开启hdfs分布式文件系统
[yinzhengjie@s101 ~]$ xcall.sh jps============= s101 jps ============18546 DFSZKFailoverController18234 NameNode18991 Jps命令执行成功============= s102 jps ============12980 QuorumPeerMain13061 DataNode13382 Jps13147 JournalNode命令执行成功============= s103 jps ============13072 Jps12836 JournalNode12663 QuorumPeerMain12750 DataNode命令执行成功============= s104 jps ============12455 QuorumPeerMain12537 DataNode12862 Jps12623 JournalNode命令执行成功============= s105 jps ============12337 Jps12151 DFSZKFailoverController12043 NameNode命令执行成功[yinzhengjie@s101 ~]$
3>.检查服务是否开启成功
4>.在hdfs中创建指定目录用于存放日志文件
[yinzhengjie@s101 ~]$ hdfs dfs -mkdir -p /yinzhengjie/logs[yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ hdfs dfs -ls -R /drwxr-xr-x - yinzhengjie supergroup 0 2018-08-13 15:19 /yinzhengjiedrwxr-xr-x - yinzhengjie supergroup 0 2018-08-13 15:19 /yinzhengjie/logs[yinzhengjie@s101 ~]$
二.修改配置文件
1>.查看可用的hdfs的NameNode节点
2>.开启log日志[温馨提示:HDFS上的目录需要提前存在
[yinzhengjie@s101 ~]$ cp /soft/spark/conf/spark-defaults.conf.template /soft/spark/conf/spark-defaults.conf[yinzhengjie@s101 ~]$ echo "spark.eventLog.enabled true" >> /soft/spark/conf/spark-defaults.conf[yinzhengjie@s101 ~]$ echo "spark.eventLog.dir hdfs://s105:8020/yinzhengjie/logs" >> /soft/spark/conf/spark-defaults.conf[yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ cat /soft/spark/conf/spark-defaults.conf | grep -v ^# | grep -v ^$spark.eventLog.enabled true #表示开启log功能spark.eventLog.dir hdfs://s105:8020/yinzhengjie/logs #指定log存放的位置[yinzhengjie@s101 ~]$
2>.修改spark-env.sh文件
[yinzhengjie@s101 ~]$ cat /soft/spark/conf/spark-env.sh | grep -v ^# | grep -v ^$export JAVA_HOME=/soft/jdkSPARK_MASTER_HOST=s101SPARK_MASTER_PORT=7077export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://s105:8020/yinzhengjie/logs"[yinzhengjie@s101 ~]$ 参数描述: spark.eventLog.dir: #Application在运行过程中所有的信息均记录在该属性指定的路径下; spark.history.ui.port=4000 #调整WEBUI访问的端口号为4000 spark.history.fs.logDirectory= hdfs://s105:8020/yinzhengjie/logs #配置了该属性后,在start-history-server.sh时就无需再显式的指定路径,Spark History Server页面只展示该指定路径下的信息 spark.history.retainedApplications=3 #指定保存Application历史记录的个数,如果超过这个值,旧的应用程序信息将被删除,这个是内存中的应用数,而不是页面上显示的应用数。
3>.分发修改的spark-env.sh配置文件
[yinzhengjie@s101 ~]$ xrsync.sh /soft/spark-2.1.1-bin-hadoop2.7/conf =========== s102 %file ===========命令执行成功=========== s103 %file ===========命令执行成功=========== s104 %file ===========命令执行成功[yinzhengjie@s101 ~]$
三.启动日志服务器
1>.启动Spark集群
[yinzhengjie@s101 ~]$ /soft/spark/sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.master.Master-1-s101.outs104: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker-1-s104.outs102: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker-1-s102.outs103: starting org.apache.spark.deploy.worker.Worker, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.worker.Worker-1-s103.out[yinzhengjie@s101 ~]$ xcall.sh jps============= s101 jps ============7025 Jps6136 NameNode6942 Master6447 DFSZKFailoverController命令执行成功============= s102 jps ============2720 QuorumPeerMain3652 DataNode4040 Worker3739 JournalNode4095 Jps命令执行成功============= s103 jps ============2720 QuorumPeerMain4165 Jps3734 DataNode3821 JournalNode4110 Worker命令执行成功============= s104 jps ============4080 Worker3781 JournalNode4135 Jps2682 QuorumPeerMain3694 DataNode命令执行成功============= s105 jps ============3603 NameNode4228 Jps3710 DFSZKFailoverController命令执行成功[yinzhengjie@s101 ~]$
2>.启动日志服务器
[yinzhengjie@s101 conf]$ start-history-server.sh starting org.apache.spark.deploy.history.HistoryServer, logging to /soft/spark/logs/spark-yinzhengjie-org.apache.spark.deploy.history.HistoryServer-1-s101.out[yinzhengjie@s101 conf]$
3>.通过webUI访问日志服务器
4>.运行Wordcount并退出程序([yinzhengjie@s101 ~]$ spark-shell --master spark://s101:7077)
5>.再次查看日志服务器页面