Home > Uncategorized > Spark SQL Transfer from Database to Hadoop

Spark SQL Transfer from Database to Hadoop

Hadoop can store structured and unstructured data. That’s the benefit of schemaless approach. However lots of our customer or data resides in Relation Database. We need to take this first into Hadoop so we can query and transform the data inside Hadoop cluster and optimizing the parallelism.

For transfering the data from relational database to hadoop usually you will use Apache Sqoop for this one. However there’s some limitation and weakness on the data type preserve-ration. Especially around datetime or timestamp. That’s why i suggest to use Spark SQL for this stuff. Spark can also be used as ETL Tools !!

Spark can transform relational database into parquet and avro data structure. So it will safe space and compress it with snappy. You can find the good explanation why we use avro and parquet on the net.

Please refer to the blog post below for transfering the data via Spark with Avro and Parquet as data file.




Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: