Using Database Designer to Build Projections with Optimal Sort Order in HP Vertica

The performance level of your queries in Vertica depend on the design of your projections. In the projection definitions you specify how your data will be segmented and how your data will be sorted. If you are going to make the right choices your queries will perform really well. I covered projection segmentation in my previous post “Building projections with optimal segmentation in HP Vertica”. In this post we will examine sort order optimization.

What tools I can use to design sort order

Standard Vertica install has a tool called “Database Designer”. When Vertica is installed Database Designer is also installed and available on every node in the cluster. Database Designer can be run in many ways: a) from administration tools; b) from management console; c) using Database Designer functions. The most common way to start Database Designer is by using administration tools. I will use this method throughout this post.

What do I need to have if I want to get sort order recommendations from Database Designer?

You need:

a running Vertica cluster;
data loaded into tables you will be working on. I am going to use the same dataset I used for the post “Optimizing storage footprint in HP Vertica with Database Designer”. If you need guidance on loading your data you can refer to that post. I have a section on loading data to Vertica;
SQL queries you will be executing as a normal workload. Database Designer will be using your SQL queries to come up with most optimal sort order.

How do I prepare a file with SQL queries for Database Designer

The most important thing in this exercise is to have a good set of SQL queries. Create this set by taking SQL queries you are planning to execute in your Vertica cluster. You need to have a good number of queries. An optimal number of queries in the file is 100. Database Designer does not have a set limit on the number of queries, but it is better not to exceed 100 significantly. The query set you are creating should represent each class of queries you will be running. You can also put several variations of queries in each query class.

Put your queries in a text file and make sure to delimit each query from the next one by semicolon (;). To make the file easier to work with I will usually add an empty line after each query. Once the file is ready, upload it to the node you will be running adminTools on. Once the file is on the node make sure that Vertica process will be able to read it. Vertica generally runs under dbadmin user.

Running Database Designer

I will work in the /tmp/sortorder directory. Currently it only has the file with my SQL queries

To run the Database Designer you need to start Vertica’s admintools.

ssh to one of the nodes of your Vertica cluster
run sudo su – dbadmin
run admintools

You will see Administration Tools Main Menu

On the “Main Menu”: Navigate and hit on “Configuration Menu”
On the “Configuration Menu”: Navigate and hit on “Run Database Designer”
On the “Select a database for design”: select the database you will be working with and hit “OK”
You will be prompted for the password. Provide the password and hit OK
On this screen enter the directory for the Database Designer output. This can be any directory dbadmin can write to. In this post I will use /tmp/sortorder
In this screen provide a name for the design you are working on (this name will be the prefix of all output files).
On the “Design Type” screen, select “Comprehensive” design type
On this screen choose which schema your table(s) reside in and hit “OK”.
On the “Design Options” screen: [ * ] Check “Optimize with queries”. We want Database Designer to use the SQL queries we prepared for optimization. [ * ] Check “Update statistics”. The Database Designer needs statistics updated to provide good advice. [ ] Keep “Deploy design” unchecked. At this time we do not want the Database Designer to make modifications to schema.
On this screen provide the full path to the file with your SQL queries, which will be the normal workload on this Vertica cluster
On this screen leave the K-safety at its default value = 1. K-safety is a number which represents how many extra copies of data you have for redundancy purposes. I will cover it in future posts.
On the “Optimization Objective” screen we will stay at default, which is “Balanced query/load performance”, because I am planning to run queries and load extra data simultaneously.
Confirm to Database Designer that we finalized all options and want to start processing by hitting “Proceed”
Database Designer will print a series of messages on the status of processing. Once it is completed it will ask you to hit “Enter” to return back to adminTools.

Once you are back in adminTools navigate out of “Configuration Menu” by hitting “Main Menu” and close adminTools by hitting the “Exit” menu item.

Analyzing Database Designer output

After running Database Designer we got a lot of additional files in our working directory.

If we will examine the contents of the working directory we will find the following categories of files

a) Backup of schema and objects before Database Designer execution [ catalog_dump.sql and schema_dump.xml ];
b) Files logging inputs provided by you in Database Designer screens [ design_config_name.txt and sortorder_params.txt ];
c) Detailed log of Database Designer execution [ designer.log ];
d) Schema design recommended by Database Designer based on your inputs [ sortorder_design.sql and sortorder_deploy.sql ].

The files we are most interested in are sortorder_design.sql and sortorder_deploy.sql. Both files have the prefix sortorder, as Database Designer uses the string we specified for design name as the prefix for these file names.

sortorder_design.sql: file contains SQL script to create the recommended objects from scratch. You can use it if all you need is to add newly recommended objects.

sortorder_deploy.sql: file contains SQL script to modify existing schema and its objects. You can use it if you accept the newly recommended structure and properties and want to take the existing schema to this new recommended design. If you would check “Deploy design” on “Design Options” screen Database Designer will run this script for you at the end of its run. In most cases the script will create new projections, move data and drop existing projections.

Examining Database Designer Recommended Sort Order

In the beginning of this exercise I created a new table and Vertica auto created a superprojection when I loaded the data. Those superprojections had no sort order at all.

When I open sortorder_design.sql file I see following SQL:

CREATE PROJECTION sortorder.ORDER_FACTS_DBD_1_rep_sortorder 
/*+createtype(D)*/
(
...
)
AS
 SELECT CLIENT_ID, 
...
 FROM sortorder.ORDER_FACTS 
 ORDER BY SOURCESYSTEM_CD,
          CLIENT_ID,
          MANUFACTURER_ID,
          QUANTITY,
          SHIPPED_DATE,
          PRODUCT_CD,
          ORDER_DATE,
          ORDER_TOTAL
...

Based on the queries I provided to Database Designer it recommends the sort order listed above to make my queries perform better. You can see ORDER BY clause containing table columns. When Vertica will be loading data into this projection it will sort it before writing to the disk. At the time you will be running SQL queries optimizer will use the Vertica catalog information about existing projections and sort orders to come up with the most efficient execution plan.