Post

4 followers Follow
0
Avatar

How to copy to data from HDFS to Gemfire XD Memory

Hi, We are planning to implement Bigdata, Hadoop as data store and Gemfire XD as in-memory db for real time applications. How to pull and push the data into Gemfire XD ? What is the best way to do it. Can we you SQLFire for that? If it is please advice.

In advance Thanks a lot.

Regards,
Venkat

Ram

Please sign in to leave a comment.

16 comments

0
Avatar

Hi Venkat!
In a nutshell, you can think of GemFireXD as SQLFire with Hadoop support, so the same clients you can use for SQLFire can be used in GemFireXD (JDBC compliant clients, ODBC for Linux, etc...)

If you have other questions, feel free to ask.

Cheers

William Markito 0 votes
0
Avatar

Hi William,

Thanks for immediate response. I've gone through some extent below site. So can we use SQLFire to load the data from HDFS to Gemfire In-memery cluster and push it back to HDFS any changed data. Let me give some history what we are trying to achieve we are migratining from OLTP to Hadoop as datastore from there we want to keep all the data into Gemfire for Real time application for the front end access. During this we update the transactions and push it back it Gemfire and we want to see the same in HDFS and Gemfire. To achieve the whole process can SQLFire will completely help me or do I need to write any map reduce jobs too. Please advice.

I have installed single node VM and trying to play around but I couldn't found ToursDB_Schema.sql. could you guide where can I find these example files.

Regards.

Ram 0 votes
0
Avatar

Hi Venkat,
As per your description, you only need GemFireXD and Pivotal Hadoop (PHD). There is no need for SQLFire since you have the same functionalities on GemFireXD + HDFS persistence capabilities. Please note that only GemFireXD has HDFS persistence feature, which is not present in SQLFire or GemFire products.

Just to give you an idea on GemFireXD you would have to setup an HDFS Store pointing to your Hadoop namenode and then create the tables using that HDFS store persistence and by doing that your CRUD operations will be persisted on HDFS.

You can download a complete Virtual Machine (VM) from Pivotal Network that comes with PHD + GFXD configured and ready to try - https://network.gopivotal.com/products/pivotal-hd

William Markito 0 votes
0
Avatar

Map reduce jobs are not needed for CRUD operations, but you may want to implement them to iterate over historical data only available in HDFS store for example...

William Markito 0 votes
0
Avatar

Got it. GemfireXD=Gemfire+Sqlfire with HDFS Persistence future. So I can use GemfireXD(Sqlfire) itself for CRUD operations. It makes my life easier because development is more faster compare to SprigDB Framework or using Java programming. And another beautiful feature I found in GemfireXD is supporting procedures and it looks like more of Java Stored procedure not equivalent or not rich like Oracle or Greenplum PLSQL procedures. Correct me If I'm wrong. Just curious Is there any equivalent feature in GemfireXD that helps to migrate Oracle PLSQL procedures or can I usepgplsql? It it is will GemfireXD will support pgplsql?.

Once again thanks a lot for answering my questions with patience.

Regards.

Ram 0 votes
0
Avatar

William,

GemFire XD to HDFS persistence is quite clear in the documentation, but we don't find anything related to HDFS to GemFire XD integration, which also seems to be one of Venkat's use cases if I understand.
GemFire XD is able to fetch data stored on HDFS that was written from GemFire XD (and evicted), but can it also fetch/sync data written by MapReduce jobs for example?

Olivier Armand 0 votes
0
Avatar

Hi Venkat,
Functions and Procedures in GemFireXD are Java based, which depending on the use case is much more powerful and flexible than other more traditional languages (pl/sql/pgsql). That said, we currently don't have pl/sql or pgsql in our next releases roadmap, but things may change... :)

William Markito 0 votes
0
Avatar

Hi Olivier!
Once you have persisted data through GemFireXD, you can read that data from traditional MapReduce jobs (including Yarn based ones) by using the GemFire RowInput/OutputFormat classes. In the link below you may find an example that illustrate this cycle.

http://gemfirexd.docs.gopivotal.com/latest/userguide/index.html?q=/latest/userguide/getting_started/topics/tutorial_mapred.html

Hope that helps.

William Markito 0 votes
0
Avatar

Thank you for your response. My use case is actually different: I have a MapReduce job producing data on HDFS (i.e. GemFire XD-independent), that I would like to fetch from GemFire XD.

Olivier Armand 0 votes
0
Avatar

Well Olivier, in that case and if you still want to use MR jobs to load data back into the GemFireXD you still have to use our formatters for the job output.

If you just want to read that data by any other means, alternatives can be:
- Write a Java HDFS client that read this data and load it back into GFXD.
- Use GemFireXD parallel procedures (written in Java for example) to distribute the load in a GemFireXD cluster.
- Use GemFireXD table functions - http://gemfirexd.docs.gopivotal.com/latest/userguide/index.html?q=/latest/userguide/data_management/table-functions/cdevspecialtabfuncs.html

All these alternatives may have some drawbacks depending on your use case... Performance, data size, read/write capabilities, join with other tables in the system... These are a few things that should be considered before taking any decision.

Hope that helps.

William Markito 0 votes
0
Avatar

Do we have integration between GemefireXD and Pentaho/any Etl tool? For example I load the data into HDFS using Pentaho/Talend/.. Then I will create the HDFS stores for GemfireXD tables. For the every day interfaces loading the data into HDFS(accessing full HDFS data by using hint queryHDFS=true) is better or GemfireXD? If I need to load into GemfireXD what is the best way? Please advice.

Ram 0 votes
0
Avatar

Hi Venkat,
Not at this moment.

Also please note that our recommendation is to read/write from/to HDFS using GemFireXD native persistence mechanism exactly because we use our own optimized format which are the only way we can support queryHDFS among other features such as high speed writing, etc..

William Markito 0 votes
0
Avatar

Hi,

We are using hortonworks for hadoop distribution for cache we are using Gemfire, how to integrate with the HDP to get the data from hdfs and load into gemfirexd cache?

Regards,
R

Ramana G 0 votes