Archive

Archive for the ‘Uncategorized’ Category

Spark SQL Transfer from Database to Hadoop

March 27, 2017 Leave a comment

Hadoop can store structured and unstructured data. That’s the benefit of schemaless approach. However lots of our customer or data resides in Relation Database. We need to take this first into Hadoop so we can query and transform the data inside Hadoop cluster and optimizing the parallelism.

For transfering the data from relational database to hadoop usually you will use Apache Sqoop for this one. However there’s some limitation and weakness on the data type preserve-ration. Especially around datetime or timestamp. That’s why i suggest to use Spark SQL for this stuff. Spark can also be used as ETL Tools !!

Spark can transform relational database into parquet and avro data structure. So it will safe space and compress it with snappy. You can find the good explanation why we use avro and parquet on the net.

Please refer to the blog post below for transfering the data via Spark with Avro and Parquet as data file.

https://weltam.wordpress.com/2017/03/27/spark-sql-transfer-from-sql-server-to-hadoop-as-parquet/

https://weltam.wordpress.com/2017/03/27/extract-rdbms-as-avro/

Cheers

Categories: Uncategorized

Spark SQL Transfer from SQL Server to Hadoop as Parquet

March 27, 2017 Leave a comment
Categories: Uncategorized

Spark SQL Transfer from SQL Server to Hadoop as Avro

March 27, 2017 Leave a comment
Categories: Uncategorized

Big Data, Apache Hadoop and Cloudera

March 26, 2017 Leave a comment

Big data is everywhere, people are talking about it. We need to be prepare to embrace this wave. If you try to find or search through the internet you will find that Hadoop is circling Big Data. Hadoop is an operating system for big data. So let’s get started to meet Apache Hadoop in Action.

The most convenient way to introduce to hadoop is by using Virtual Machine provided by Cloudera. Cloudera is one of the biggest vendor that bundle hadoop ecosystem in one package.  It also provide monitoring and easier manager to manage your cluster. It’s really easy also to deploy the whole cluster with this package. Let’s continue with downloading the package.

Please download Cloudera Quickstart from this link. It almost 4 Gb. If you would like to register and find the latest installer please go to this site.

image

 

Make sure that you have installed VMWare Player on your machine. Please find the installer in this site.

image 

Extract the quickstart files and open from VMWare.

image

You need to have 8GB RAM and 2 Virtual CPU. Please configure that from Virtual Machine settings.

image

image

 

Run the Virtual machine and after it finish booting then execute “Launch Cloudera Express” from Desktop. Please be patience until all service has been started.

 

image

 

If you haven’t open the browser, please open it and open Cloudera Manager from bookmarked address.

image

 

Login with Username : cloudera, and password : cloudera.

image

 

Make sure that all service is running. If there’s some service down, please start that manually.

image

 

You can also access HUE ( Hadoop User Experience ) from browser bookmark also. You can login with the same username and password you entered to Cloudera Manager.

image

image

 

If you want to make sure all installation is correct then you can do some health checking by doing this testing.

Congratulation you have successfully run single node cluster with Cloudera distribution.  Smile 

For more tutorial you can download this via this workshop. Thanks to Gandhi Manalu and Institut Teknologi Del.

If you still have some spirit and energy left please follow Cloudera Comprehensive tutorial from this site.

 

Cheers

Categories: Uncategorized

Memulai Programming dengan Apache Hadoop

February 25, 2017 Leave a comment

Kata-kata big data sekarang menjadi semakin marak, semua orang ramai membicarakannya. Kita harus cepat-cepat bersiap-siap menyambut perubahan teknologi yang sangat cepat. Jika kita membicarakan big data maka kita tidak akan lari jauh-jauh dari Apache Hadoop. Apache Hadoop adalah operating system untuk big data. Mari kita memulai pelajaran pertama ini dengan berkenalan dengan Apache Hadoop

Cara paling cepat untuk memulai berkenalan dengan hadoop adalah dengan memanfaatkan Virtual machine yang disediakan oleh Cloudera. Cloudera adalah salah satu vendor yang mendukung dan membundle hadoop ecosystem dalam satu package. Ini memudahkan kita untuk melakukan deployment dibandingkan dengan menginstall component hadoop satu per satu.

Silahkan download cloudera quickstart dari link ini. Cukup besar kira-kira 4 gb.

Pastikan VMWare Player anda telah terinstall juga. Jika anda belum menginstall VMWare silahkan download dari link ini.

Extract file Quickstart tersebut dan buka file tersebut dengan VMWare.

Untuk menjalankan Cloudera Express kita membutuhkan minimal 8GB RAM dan 2 virtual CPU. Silahkan setting pada virtual machine Cloudera quickstart.

Jalankan Virtual Machine tersebut dan setelah selesai booting maka eksekusi “Launch Cloudera Express” dari Desktop. Harap bersabar menunggu semua service start.

Buka browser anda dan masuk ke Cloudera Manager.

Login dengan Username : cloudera, Password : cloudera

Pastikan semua service running. Jika tidak ada yang running, maka anda dapat menghidupkannya dengan memilh menu start dari cloudera host drop down  menu.

Jika anda ingin memastikan instalasi benar benar berhasil maka anda dapat melakukan testing dengan cara berikut.

Arahkan browser anda ke Hue, anda dapat melihatnya pada bookmark web browser. Login dengan username dan password yang sama.

Selamat anda sudah berhasil menjalakan cluster single node anda dengan cloudera distribution. 🙂

Untuk tutorial lebih detail dapat di download melalui workshop berikut ini. Thanks to Gandhi Manalu dan Institut Teknologi Del.

Jika anda masih semangat dan makin penasaran, silahkan ikuti tutorial berikut ini

Cheers

Categories: Uncategorized

.NET Core Microservices using GeekseatBus

October 22, 2016 2 comments

2016-10-20_12-16-30

GeekseatBus is a simple message bus that can be use to create microservices in .NET.

Here’s the background why we do this on our own.

Background

A lot of microservices right now is in favour using REST API for communication between services. We are in geekseat take different approach for microservices and we avoid request response between services. This is align with SOA tenets for autonomous component. You component can’t be autonomous if you still using request response. If one service dies, other service is dies too. This will create temporal coupling between services.

We are a big fan of Udi Dahan style of microservices. As it enable high cohesion and loose couple on our big system. If you want to learn more you can register for 2 days course for free from here.

The central things from this approach is the needs of message bus. Message bus is used to communicate via fire and forget and also publish and subscribe between each services. So it promotes loosely coupling between each component.

Geekseat has been researching for simple message bus on RabbitMQ. We found NServiceBus and,  Mass Transit. But both of platform can’t be used in Linux. That’s a big problem for us as our backend currently written on .NET Core ( cross platform .net ). So we decide to create our own implementation of Message Bus on top of RabbitMQ.

We has published GeekseatBus to nuget. This is a big first start for us as we are start to open source our infrastructure for microservices.
We are a great believer of keeping thing simple and minimal configuration. As an agile company we like to see our simple stuff works in production. GeekseatBus also rely on convention over configuration. This will made things easier to use.

Getting Started

Ok, Let’s get started. Here’s the schema of what we are going to achieve on this head first with GeekseatBus.

geekseatbus_services

From the above schema we can see that we have 2 services. Order Service and Billing Service. Order Service have 2 component which is Order Client and Order Service ( Server ).

Geekseat.BillingService subscribe to the OrderPlaced event published from OrderService and do it’s own thing by billing the customer according to the product ordered.

Fire and Forget Demo

Now let’s open our beloved Visual Studio IDE.

Create Solution and the .NET core console application for OrderService. This will be OrderService endpoint. This service will handle PlaceOrder command and publish OrderPlaced event.

create_order_services

Add reference to GeekseatBus nuget package to Geekseat.OrderService project.

geekseatbus-nuget

Create class library project for messages. We have 2 messages, PlaceOrder command and OrderPlaced event. We have convention for naming the project. You should have project name that contain service name as prefix. Ex: If your service name is Geekseat.OrderService than your message project should be Geekseat.OrderService.Messages.

orderservice_messages

You also should have create two directory for events and commands like this one. This will give you the namespace for events ( Geekseat.OrderService.Messages.Events ) and for commands ( Geekseat.OrderService.Messages.Commands)

Create PlaceOrder command on Commands folder and OrderPlaced events on Events folder. And Delete Class1.cs.

Make the content of PlaceOrder.cs like this.

placeorder

And the content of OrderPlaced.cs like this.

orderplaced

Please make sure that your namespace follow the conventions we mention above.

Create another endpoint for Geekseat.OrderClient.

ordercilent

Add GeekseatBus reference to Geekseat.OrderClient. Also add GeekseatBus.OrderService.Messages to OrderClient and OrderService.

addrefmessages

Now we can start creating a message handler for PlaceOrder in OrderService.

placeorderhandler

On Program.cs ( in OrderService ) you should start the bus with this code. Really simple startup right ?

programorderservice

You can try to run the OrderService to see what convention is used on creating queue and exchange in rabbitmq. Basically each service will have it’s own queue ( single queue for handling multiple message). This queue can be bind to event that the service interested in.

A service can also create an exchange for the event it’s published. You can check that OrderService have a queue named Geekseat.OrderService and Exchange Geekseat.OrderService.Messages.Events.OrderPlaced.

queue

exchange

Let’s now send some message to our service. Now we will concentrate on Geekseat.OrderClient.

orderclient_program

Run both OrderService and OrderClient. And press enter to send the message from Client.

runnning

Voila, it receive the message !

Publish and Subscribe Demo

Now we will publish an OrderPlaced event from OrderService. The publishing will be handle in PlaceOrderHandler. We will inject IGsBus into this handler and do publishing.

orderplacedhandler

Ok. Now let’s create subscriber for that event. We will leverage exchange in RabbitMQ. But of course this will be transparent from the user.

Create a new console project Geekseat.BillingService. Add reference to messages and GeekseatBus. Create message handler OrderPlacedHandler.

billservicehandler

Add the service startup for BillingService and we’re done.

billingstartup

Run all the console application (OrderService, BillingService and OrderClient). Enter the message from OrderClient and you can see that the event has been published to BillingService.

console_final.png

It Works !

We have open source this library on Github. You can download, experiment and give a pull request !

Happy Microservicing 🙂

 

Cheers

 

 

 

Publishing to Maven Central

September 5, 2016 Leave a comment

You can follow the tutorial for publishing your library from this following series of youtube videos. You will need to request via Sonatype JIRA and wait for approval. It’s explained in the series.

However if you want the easiest way you can leverage using bitbucket pipeline. You can see the step by step on how to do that from this tutorial. It’s still in beta phase but looks promising. You should give it a try and ask for the trial. Yes, you need to request the trial first and wait for the approval.

The missing part on the tutorial above is on how to generate the key. Here’s how you can do that in Windows machine.

  1. Download the gnugpg for windows here
  2. Setup the key for signing
    1. Generate the key
      gpg –gen-key
    2. Find the list of the key to find the id
      gpg2 –list-keys
      gpg2 –list-secret-keys

      Let’s say that the id is 4C7CF393. We will use this key in the subsequent step.

    3. Publish the public key
      gpg2 –keyserver hkp://pool.sks-keyservers.net –send-keys 4C7CF393
    4. Export the private key
      gpg -a –export-secret-key 4C7CF393 > private-key.gpg
    5. Donlot http://gnuwin32.sourceforge.net/packages/openssl.htm for encryption
    6. Add the binaries to your PATH environment variables
    7. Encrypt the key
      openssl aes-256-cbc -in private-key.gpg -out private-key.gpg.enc -pass pass:somesecretpwd
  3. Upload that key to your repository. You can base your repository by copying this repository and replacing the private key there.
  4. Copy your source code and etc there.
  5. Replace the configuration for username and password by looking at README from this repository
  6. Edit the pom.xml so it publish your library
  7. You can follow the rest of the tutorial from this

Happy Publishing !

Cheers

Categories: Uncategorized