Post

3 followers Follow
0
Avatar

Hawq queries both HDFS and inmemory data or only the HDFS data?

If we query a hawq external table from psql command prompt does it query only the data(which meet the eviction criteria ) persisted to hdfs or both the in memory data and the hdfs data?

arghya sadhu Answered

Please sign in to leave a comment.

5 comments

0
Avatar

When Hawq queries an external table that points to GemFire XD, it only accesses the data that has been persisted to HDFS (this data may also reside in-memory). You can ensure that all data has been flushed by invoking SYS.HDFS_FLUSH_QUEUE. Also, set the CHECKPOINT parameter on the external table definition to determine if Hawq should read only compacted rows (default) or raw updates where you may get duplicates, etc.

Anthony Baker 0 votes
0
Avatar

Every insert/update/delete written to both in-memory storage (and replicated to preserve data redundancy) and queued for an asynchronous HDFS write. The queue is flushed automatically every 60 sec or when the queue size reaches the BatchSize attribute on HDFSSTORE (32 MB by default). The queue can also be flushed manually using SYS.HDFS_FLUSH_QUEUE.

The EVICTION BY CRITIERIA clause affects which rows are kept in-memory but does not change what data is written to HDFS.

Anthony Baker 0 votes
0
Avatar

Please check HDFS store creation page for reference on the properties you can define and also HDFS eviction settings, which can set how eviction will work.

http://gemfirexd.docs.pivotal.io/latest/userguide/disk_storage/persist-hdfs-topics.html#topic_BEE94968A0004341A76592FC767BC497_hdfs-eviction-by-criteria
http://gemfirexd.docs.pivotal.io/latest/userguide/index.html?q=/latest/userguide/reference/language
ref/ref-create-hdfs-store.html

Thanks

William Markito 0 votes