Archive
Creating scala uber jar executable
Currently we want to have a single jar that contains all the library so it can be run as standalone tools. This can be done with sbt assembly.
If you are using sbt with IntellijIDEA you should add assembly.sbt under project directory.
Enter the following line into assembly.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.4")
Refresh the sbt files from the right hand corner. If you don’t see the SBT pane, you can activate from View-> Tool Windows –> SBT
It will download the plugin to your project.
After it finished downloading. You can then run the sbt assembly command from terminal.
After that you can execute your jar by entering the following command. Please adjust that with your jar name and location
java -jar target\scala-2.10\hive-jdbc-assembly-1.0.jar
Congrats you now have standalone tools in the single uber jar.
Cheers
Spark SQL Transfer from Database to Hadoop
Hadoop can store structured and unstructured data. That’s the benefit of schemaless approach. However lots of our customer or data resides in Relation Database. We need to take this first into Hadoop so we can query and transform the data inside Hadoop cluster and optimizing the parallelism.
For transfering the data from relational database to hadoop usually you will use Apache Sqoop for this one. However there’s some limitation and weakness on the data type preserve-ration. Especially around datetime or timestamp. That’s why i suggest to use Spark SQL for this stuff. Spark can also be used as ETL Tools !!
Spark can transform relational database into parquet and avro data structure. So it will safe space and compress it with snappy. You can find the good explanation why we use avro and parquet on the net.
Please refer to the blog post below for transfering the data via Spark with Avro and Parquet as data file.
https://weltam.wordpress.com/2017/03/27/spark-sql-transfer-from-sql-server-to-hadoop-as-parquet/
https://weltam.wordpress.com/2017/03/27/extract-rdbms-as-avro/
Cheers
Spark SQL Transfer from SQL Server to Hadoop as Parquet
Spark SQL Transfer from SQL Server to Hadoop as Avro
Big Data, Apache Hadoop and Cloudera
Big data is everywhere, people are talking about it. We need to be prepare to embrace this wave. If you try to find or search through the internet you will find that Hadoop is circling Big Data. Hadoop is an operating system for big data. So let’s get started to meet Apache Hadoop in Action.
The most convenient way to introduce to hadoop is by using Virtual Machine provided by Cloudera. Cloudera is one of the biggest vendor that bundle hadoop ecosystem in one package. It also provide monitoring and easier manager to manage your cluster. It’s really easy also to deploy the whole cluster with this package. Let’s continue with downloading the package.
Please download Cloudera Quickstart from this link. It almost 4 Gb. If you would like to register and find the latest installer please go to this site.
Make sure that you have installed VMWare Player on your machine. Please find the installer in this site.
Extract the quickstart files and open from VMWare.
You need to have 8GB RAM and 2 Virtual CPU. Please configure that from Virtual Machine settings.
Run the Virtual machine and after it finish booting then execute “Launch Cloudera Express” from Desktop. Please be patience until all service has been started.
If you haven’t open the browser, please open it and open Cloudera Manager from bookmarked address.
Login with Username : cloudera, and password : cloudera.
Make sure that all service is running. If there’s some service down, please start that manually.
You can also access HUE ( Hadoop User Experience ) from browser bookmark also. You can login with the same username and password you entered to Cloudera Manager.
If you want to make sure all installation is correct then you can do some health checking by doing this testing.
Congratulation you have successfully run single node cluster with Cloudera distribution.
For more tutorial you can download this via this workshop. Thanks to Gandhi Manalu and Institut Teknologi Del.
If you still have some spirit and energy left please follow Cloudera Comprehensive tutorial from this site.
Cheers