Pivotal Knowledge Base

Follow

GPText Best Practice

This article provides best practice on how to manage GPText.

1. Upgrade GPDB to latest version to use UDPIFC interconnect type.
There is a known issue that GPText functions may hang in UDP interconnect type.
How to check current GPDB interconnect type:

gpconfig -s gp_interconnect_type

How to change the interconnect type:

gpconfig -c gp_interconnect_type -v UDPIFC
gpstop
gpstart

2. When executing GPText search queries, use TABLE(select 1 scatter by 1) as the table function to leverage the index scan.

The TABLE query is planned and affects the estimate of of gptext.search(), but does not execute. 
For example, if your query includes:
gptext.search(TABLE(SELECT * FROM t), ...)
The query planner estimates the number of results as the number of rows in t. This can cause the query planner to ignore the use of an index scan. Use a query like TABLE(SELECT 1 SCATTER BY 1) to avoid this issue.

3. Consider GPText memory usage when setting gp_vmem_protect_limit of GPDB.

Per GPDB Admin Guide, gp_vmem_protect_limit should be set to:
( X * physical_memory ) / primary_segments
Where X is a value between 1.0 and 1.5. X=1 offers the best system performance. X=1.5 may cause more swapping on the system, but less queries will be cancelled.

Above calculation does not consider GPText memory usage.
If your cluster has GPText running on segments, above "physical_memory" should deduct sum of the max memory used by all GPText processes on each segment server.

For example, 
If one segment server has 48 GB physical memory and 6 primary segment instances are running on each server.
Assume GPText is using default memory setting which is "-Xms1024M -Xmx2048M", so above "physical_memory" for GPDB should be 48G - 6 * 2GB(-Xmx) = 36GB.

4. Upgrade GPText to 1.2.2.0 or above version to avoid GPText search query hang issue.
When the connection is closed because of a timeout, the GPText search query could potentially hang due to a known bug. 
With the fix in GPText 1.2.2.0, the search query will not hang and, instead, will error out.

5. Use "gptext-state --healthcheck" to check GPText inconsistency issues.

[gpadmin@smdw gpAdminLogs]$ gptext-state --healthcheck
20131221:05:45:17:004540 gptext-state:smdw:gpadmin-[INFO]:-Locating gptext schema...
20131221:05:45:17:004540 gptext-state:smdw:gpadmin-[INFO]:-Checking for Solr Configuration files consistency...
20131221:05:45:18:004540 gptext-state:smdw:gpadmin-[INFO]:-Checking for Solr indexes consistency...
20131221:05:45:19:004540 gptext-state:smdw:gpadmin-[INFO]:-Checking for required disk space...
20131221:05:45:19:004540 gptext-state:smdw:gpadmin-[INFO]:-Checking for required user privileges...
20131221:05:45:19:004540 gptext-state:smdw:gpadmin-[INFO]:-Checking for index configuration files consistency...
20131221:05:45:20:004540 gptext-state:smdw:gpadmin-[INFO]:-Checking for indexes and database consistency...
20131221:05:45:20:004540 gptext-state:smdw:gpadmin-[INFO]:-Done

6. Use "gptext-detective" to collect logs when opening case to support.
Note that gptext-detective and healthcheck can considerably take lot of time depending upon cluster size, index size respectively.

[gpadmin@smdw gpAdminLogs]$ gptext-detective
20131221:05:49:18:007151 gptext-detective:default-[INFO]:-Getting Solr directory locations...
20131221:05:49:19:007151 gptext-detective:default-[INFO]:-Getting Index Names...
20131221:05:49:19:007151 gptext-detective:default-[INFO]:-Checking for missing Solr directories...
20131221:05:49:20:007151 gptext-detective:default-[INFO]:-SOLR DIRECTORY CHECK: PASSED
20131221:05:49:20:007151 gptext-detective:default-[INFO]:-Collecting information
Made gpadmin directory
  +     get gpAdminLogs/gptext-*.log
  +     get Solr directory Logs
  +     get gpdb host logs
  +     perform functional tests
  + makeTar
20131221:05:50:16:007151 gptext-detective:default-[INFO]:-Done

7. Remember to create 2 soft links when GPDB is upgraded.

Please refer to article What needs to be done after upgrading GPDB.

Comments

  • Avatar
    ranjini

    how to use GPTEXT query in below case,

    for a word frequency
    for a word occurance
    for a Pattern Search
    for a word count
    for a word position

  • Avatar
    ranjini

    ?
    Please support and help thanks in advance

Powered by Zendesk