将OFF_HEAP存储与Spark 1.4.0和Tachyon 0.6.4一起使用时出错


70

我正在尝试使用Spark 1.4.0和Tachyon 0.6.4上的堆外存储来保持我的RDD,如下所示:

val a = sqlContext.parquetFile("a1.parquet")
a.persist(org.apache.spark.storage.StorageLevel.OFF_HEAP)
a.count()

之后,我得到以下异常。

有什么想法吗?

15/06/16 10:14:53 INFO : Tachyon client (version 0.6.4) is trying to connect master @ localhost/127.0.0.1:19998
15/06/16 10:14:53 INFO : User registered at the master localhost/127.0.0.1:19998 got UserId 3
15/06/16 10:14:53 INFO TachyonBlockManager: Created tachyon directory at /tmp_spark_tachyon/spark-6b2512ab-7bb8-47ca-b6e2-8023d3d7f7dc/driver/spark-tachyon-20150616101453-ded3
15/06/16 10:14:53 INFO BlockManagerInfo: Added rdd_10_3 on ExternalBlockStore on localhost:33548 (size: 0.0 B)
15/06/16 10:14:53 INFO BlockManagerInfo: Added rdd_10_1 on ExternalBlockStore on localhost:33548 (size: 0.0 B)
15/06/16 10:14:53 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 5710423667942934352
org.apache.spark.storage.BlockNotFoundException: Block rdd_10_3 not found
    at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:306)
    at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
    at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
    at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
    at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
    at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)

我也尝试过使用文本文件进行同样的操作,并且能够将其保存在Tachyon中。问题在于持久保留最初从镶木地板读取的DataFrame。


我认为,您在构建后添加了这些方法,或者可能是,存在一些构建错误。尝试重建。
2015年

我还没有自己添加它们。Spark在将RDD保留到
Tachyon的

我不知道tachyon。您首先检查依赖项版本,是否兼容或缺少某些依赖项。
2015年

1
您可以在此处发布代码吗?
Paul H.

1
您确定这与堆外分配有关吗?在这种情况下,可能与序列化问题有关。对代码也很感兴趣。
Adrien M.

Answers:



0

现在已解决此问题。我现在可以确认使用Spark 1.5和Tachyon 0.7可以正常工作

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.