After changing tables to parquet format, errors about insufficient statement memory are displayed when running some complex queries. This can happen using either YARN or Resource Queues for HAWQ resource management.
"ERROR","53000","insufficient memory reserved for statement",,"Increase statement memory or reduce the number of Parquet tables to be scanned.",,,
If the target tables are parquet, the planner will reserve more memory for scanning tables by design. One of the reasons for this is that the parquet table is compressed. Here is an estimation for memory accounting calculation on parquet insert/select statements:
- For insert: The memory occupation for each uncompressed table is row group size, default 8 MB. If the table is compressed, it’s 2 * rowgroupSize.
- For scan: Given the attribute list we need to scan, the memory occupation is calculated by Sum(toBeselectedAttributeLen)/wholeRecordLen * rowgroupSize for the uncompressed table. It will double if the table is compressed.
Parquet is composed of row group. We need to read specified column chunks in row group units into memory one by one. The row group should not be too small so that it makes sense to skip some column chunks and avoid storing too much metadata information. Meanwhile, AO is composed record by record, there’s no complex composition there, thus, we can read small blocks into memory once.
For the detailed implementation, refer to the code here.
Increase the value of the following parameters to increase the number of virtual segments per query and memory quota of the virtual segments:
This can be modified at session level:
SET HAWQ_RM_STMT_NVSEG=X; SET HAWQ_RM_STMT_VSEG_MEMORY='Xmb';
Or else, by adding the entries to Custom hawq-site through Ambari.
Alternatively, edit Resource Queues if a Standalone mode is being used:
ALTER RESOURCE QUEUE queue1 WITH (VSEG_RESOURCE_QUOTA='mem:Xmb');
Or use YARN Queue Manager if YARN is being used for managing HAWQ resources.