I am trying to write a spark df to 'tfrecord'
df.write.mode("overwrite").format("tfrecord").option("recordType", "tfrecords").save(outputPath + '/tf-records/')
I am running on gcp dataproc cluster which comes with spark version '3.1.2' and I am using spark-tfrecord jar - 'spark-tfrecord_2.12-0.3.4.jar'
Seeing below error on write operation -
22/01/21 05:33:13 ERROR org.apache.spark.util.Utils: Aborting task
java.lang.IllegalArgumentException: Unsupported recordType tfrecords: recordType can be Example or SequenceExample
at com.linkedin.spark.datasources.tfrecord.TFRecordOutputWriter.write(TFRecordOutputWriter.scala:33)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:140)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:278)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:286)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:210)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
Appreciate your inputs on this issue, Thanks.