-
Notifications
You must be signed in to change notification settings - Fork 56
tfrecord write results in no data but no error #46
Copy link
Copy link
Open
Description
Hi -- I am trying to use spark-tfrecord with Spark 3.1.2, but the files written have no data.
- Spark 3.1.2
- Python 3.8.10
- Java 1.8.0
- Scala 2.12.10
I'm using the latest version available from the maven repo as:
<dependency>
<groupId>com.linkedin.sparktfrecord</groupId>
<artifactId>spark-tfrecord_2.12</artifactId>
<version>0.3.4</version>
</dependency>Following the pyspark example from the README but simplified further:
path = "/tmp/test-output.tfrecord"
fields = [
StructField("a", IntegerType()),
StructField("b", FloatType()),
StructField("c", StringType()),
]
schema = StructType(fields)
test_rows = [
[1, 0.5, 'x'],
[2, 1.5, 'y'],
[3, 2.5, 'z'],
]
rdd = spark.sparkContext.parallelize(test_rows)
df = spark.createDataFrame(rdd, schema)
df.show()Outputs:
+---+---+---+
| a| b| c|
+---+---+---+
| 1|0.5| x|
| 2|1.5| y|
| 3|2.5| z|
+---+---+---+
Saving the spark dataframe to tfrecord does not throw an error.
path = "/tmp/test-output.tfrecord/"
df.write.mode("overwrite").format("tfrecord").option("recordType", "Example").save(path)But the directory only has a _SUCCESS flag and a crc file, no data.
ls -la /tmp/test-output.tfrecord/
total 12
drwxr-xr-x. 2 build build 4096 Feb 19 19:00 .
drwxrwxrwx. 11 root root 4096 Feb 19 19:00 ..
-rw-r--r--. 1 build build 0 Feb 19 19:00 _SUCCESS
-rw-r--r--. 1 build build 8 Feb 19 19:00 ._SUCCESS.crc
And of course, trying to read the file fails.
spark.read.format('tfrecord').option('recordType', 'Example').load(path).show()Error:
AnalysisException: Unable to infer schema for TFRECORD. It must be specified manually.
Let me know if there is more system/config information that could help to debug this.
FWIW, I had the exact same situation when testing spark-tensorflow-connector which I was building from source. I figured there was something wrong with my dependencies or something and thought I would try this project.
thanks,
Dennis
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels