Apache Spark Troubleshooting
In CDH 5.8.0
spark-shell
#java.lang.NoClassDefFoundError: parquet/hadoop/ParquetOutputCommitter
Spark context available as sc (master = yarn-client, app id = application_1470984222577_0004).
java.lang.NoClassDefFoundError: parquet/hadoop/ParquetOutputCommitter
at org.apache.spark.sql.SQLConf$.<init>(SQLConf.scala:319)
at org.apache.spark.sql.SQLConf$.<clinit>(SQLConf.scala)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:85)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:77)
at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1038)
at $iwC$$iwC.<init>(<console>:15)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:133)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:305)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:160)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: parquet.hadoop.ParquetOutputCommitter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 52 more
<console>:16: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:16: error: not found: value sqlContext
import sqlContext.sql
^
CDH 기반 환경에서 local에서 spark-shell 로 CDH spark 접근시, 위 worning 이 발생할 때
CDH tarball 페이지에서 Apache Parquet 을 내려 받고,
/usr/local/parquet/current/parquet-hadoop/target/parquet-hadoop-1.5.0-cdh5.8.0.jar 파일을
spark-defaults.conf 의 spark.driver.extraClassPath 옵션으로 추가 하면 해결됨.
#Pending Spark shell job with logging that “Failed to connect to driver xxx at retrying …”
ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.99.1:55546, retrying ...
ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.99.1:55546, retrying ...
ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.99.1:55546, retrying ...
ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:484)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:187)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:653)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:651)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:674)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
INFO util.ShutdownHookManager: Shutdown hook called
위와 같이 Yarn ApplicationMaster(이하 AM) 가 spark-shell 이 실행된 local IP 가 아닌 다른 내부 IP 로 접근을 하려고 시도하지만, spark-shell 이 실행된 네트워크 인터페이스가 아니기 때문에, 결국 연결 실패로 종료되는 경우.
맥에서 Docker 를 실행하거나, VM 툴 들을 사용할 경우, 가상의 네트워크 인터페이스들이 생성된다.
spark-shell 이 그 가상 네트워크 인터페이스의 IP를 사용하여 AM 에 접속하여, Spark Job 이 Yarn 에서 pending 되는 상황이다.
export SPARK_LOCAL_IP=<Real IP Address>
위와 같은 설정으로, local IP 를 고정 시킬수 있음.
bash_profile 정도에 등록하면 해결된다.
Newest Posts
- [정리] 정보이론: 정보량 (Information), 엔트로피 ( Entropy ), 쿨백 라이블러 발산 (KL-Divergence), 크로스 엔트로피 ( Cross - Entropy ), maximum likelihood
- [발번역] Bag of words (BoW) - Natural Language processing
- Installing Anaconda and Jupyter notebook
- 다시 보는 Java : FileChannel transferTo()
- 다시 보는 Java : NIO Channel
- 다시 보는 Java : Socket-Direct-Protocol
- 다시 보는 Java
- Streamsets DataCollector Source Build
- Apache Helix Core Concepts
- Introduce Flipkart Aesop
Tag Cloud
4.0 (1)
Aesop (1)
Apache (2)
BIGDATA (1)
Bag of Words (1)
BoW (1)
CDC (7)
Centos (1)
Channel (1)
DOC (1)
DataCollector (1)
Database (1)
Databus (6)
Distributed (1)
Elevation (1)
FAQ (1)
FileChannel transferTo (1)
Head First (3)
Hive (1)
Import (1)
Information (1)
JDBC (1)
JNI (2)
JS (1)
Java (10)
JavaScript (2)
KL Divergence (1)
Kafka (4)
Lambda (1)
Lambda Architecture (1)
LinkedIn (6)
Linux (2)
MQ (1)
Monitoring (1)
NG (3)
NIO (2)
NIO Channel (1)
Network (1)
Nimbus (1)
Open API (2)
Open Source (6)
Python (1)
QueryElevationComponent (1)
Raspberry Pi (1)
Real Time (1)
SDP (1)
Score (1)
Sockets Direct Protocol (1)
Statistics (3)
Storm (4)
Storm master woker (1)
Streamsets (1)
Streamsets DataCollector (1)
Struts (2)
Summary (3)
TIP (1)
Tutorial (1)
WORKER (1)
Wiki (3)
XAuth (1)
XQuery (1)
anaconda (1)
android.mk (1)
apache Helix (1)
apache flume (3)
apache kafka (2)
apache spark (2)
architecture (1)
autocomplete (1)
backup (2)
blkid (1)
build (1)
builder (3)
cassandra (3)
cassandra h/w (1)
cassandra hardware (1)
cassandra remote client (1)
cassandra troubleshooting (1)
cassandra warning (1)
cloud (1)
collaborative filtering (1)
cross entropy (1)
data import (1)
databus (1)
db indexing (1)
dead letter exchange (1)
distributed search (2)
dlx (1)
docker (1)
entropy (1)
fdisk (1)
flipkart (1)
flipkart Aesop (1)
flume (3)
flume ng (3)
fq (1)
framework (5)
fstab (1)
function (1)
function query (2)
gradle (3)
hadoop (5)
hadoop + solr (1)
hadoop integration solr (1)
hadoop+solr (1)
hashing trick (1)
head first statistics (3)
hive begins (1)
hive tutorial (1)
hive 소개 (1)
iBATIS (3)
index (3)
index backup (1)
index replication (1)
indexing (4)
integration (2)
introduce (1)
java (7)
java Troubleshooting (1)
java monitoring (1)
javascript object (1)
javascript 접근자 (1)
jupyter (1)
kafka document (1)
kafka introduction (1)
katta (3)
katta hadoop (1)
katta install (1)
koreanAnalyzer (1)
koreanAnalyzer 4.0 (1)
load test (1)
look again (4)
lucene (5)
lucene + hadoop (1)
lucene 4.0 (1)
lucene 4.0 한글 analyzer (1)
lucene 4.1 (1)
machine learning (1)
master (1)
maximum liklihood (1)
memcached (1)
memory (1)
mount (1)
multi mechanize (1)
multiindex (1)
oauth (1)
opensource (14)
oracle (1)
predictionIO (1)
rabbitmq (1)
recommendation system (1)
replication (1)
search (1)
server load (1)
shard (1)
shark shell (1)
similarity algorithm (1)
slideshare (1)
sola admin (1)
solr (21)
solr + hadoop (4)
solr 4.0 (3)
solr 4.1 (1)
solr backup (1)
solr cloud (1)
solr distributed (1)
solr index backup (1)
solr indexing (1)
solr shard (1)
solr tip (1)
solr wiki (2)
solr 한글 analyzer (1)
solr4.0 (1)
solrcloud (2)
solrcolud (1)
sort (4)
sortMissingFirst (1)
sortMissingLast (1)
spark (2)
spark cluster (1)
spout (1)
storm master node (1)
storm spout (1)
storm wokrer node (1)
storm 구성 (2)
storm 마스터 노드 (1)
storm 워커 노드 (1)
storm 정의 (2)
storm kafka (2)
suggeest (1)
suggester (1)
supervisor (1)
tf idf (1)
tomcat (2)
tomcat configuration (1)
tomcat tuning (1)
tomcat7 (2)
tools (10)
transferTo (1)
troubleshooting (2)
tuning (1)
tutorial (1)
ubuntu10.04 (1)
ubuntu10.04 network (1)
vert.x (1)
xdk (1)
검색 점수 (1)
계획 (1)
낙서 (1)
당신 인생 한 모퉁이에 나를 (1)
도커 (1)
도커 소개 (1)
람다 (1)
람다 아키텍처 (1)
루씬 (1)
마스터 노드 (1)
발 번역 (4)
번역 (2)
복제 (1)
부하 (1)
분석 (5)
세계문학전집 (1)
스트럿츠 (2)
쏠라 (1)
아이바티스 (3)
아파치 카프카 (1)
아파치 플럼 (2)
에쿠니 가오리 (2)
올리기 (1)
워커 노드 (1)
유사도 알고리즘 (1)
일기 (1)
젊은 베르테르의 슬픔 (1)
제비꽃 설탕절임 (1)
추천시스템 (1)
카산드라 (2)
카산드라 문제 (1)
카산드라 설치 문제 (1)
카산드라 워닝 (1)
카산드라 원격접속 (1)
카산드라 해결 (1)
카타 (2)
카타 설치 (1)
카프카 (1)
태그를 입력해 주세요. (1)
통계 (3)
통계학 (3)
특정 문서 (1)
플럼 (2)
플럼 ng (2)
하둡 (1)
하드웨어 (1)
한 글자 오류 (1)
한 글자 형태소 분석 (1)
한글자 오류 (1)
한글자 형태소 분석 (1)
헤드 퍼스트 (3)
헤드 퍼스트 통계학 (3)
협업 필터링 (1)