A Study of the Efficiency of Spatial Indexing Methods Applied to Large Astronomical Databases

Slide1

This is the title of a poster presented by Tom Donaldson, Bernie Shiao (both of STScI) , John Good (Caltech/IPAC-NExScI) and myself at the 231st AAS meeting in Washington DC (January 8-12).  I am attaching a copy of the poster below, and linking a copy of the paper we prepared for the proceedings of ADASS XXVII (Santiago, Chile).
Briefly, we studied the the comparative performance of databases as follows:

Indexing depth (cell size) of Hierarchical Triangular Mesh (HTM) vs. HEALPix
PostgreSQL vs. SQL servers
Linux vs. Solaris vs. Windows

and we did this for two catalogs: the 2MASS All Sky Catalog (which covers the complete sky; 470,000,000) and the unmerged Hubble Source catalog (which has sparse sky coverage; 384,000,000 sources).
The main results are:

Query time is dominated by I/O.
Indexing depth—and not choice of index—has the greatest impact on performance: trade-off between too many sources and too many cells.
Optimum index depth depends on query radius distribution. (We used a log scale from 1 arcsec to 1 degree).

See the poster for the figures showing these results.
 

Link to ADASS paper (PDF)
 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *