A few UCD Topic Maps under Omnigator (old UCD model)


A ppt presentation given at SPIE 2002
A UCD Topic Map generator

When we started work on the UCD Topic Map, we were thinking of being able to merge appropriate columns from different catalogs using the UCDs which so uniformly describe the columns.

However, before that can happen, many small issues have to be taken care of. Here we present a road map of the procedure with the steps that have been completed so far. We have used the free tool Omnigator from Ontopia to explore Topic Maps. Many snap shots will be seen on this page. Clicking on the thumbnails will take you to full-size images.

A brief reintroduction to topic maps
What is a UCD Topic Map?
How does one move around inside the map?
What is the map capable of?
Is that all?
What exactly does the Topic Map generator do?
But how is this any different or better than all the other services out there?


Reiterating what a Topic Map is

A Topic Map is to XML what SQL is to databases. Thus, TMs are the tools to play around within the XMLspace. What do TMs have? They have topics, and relationships between topics (called associations, generally binary), and occurrences of the topics in different contexts (called scopes) and so on. A TM then is a network of hyperlinks overlaid on an information resource connecting items (columns, topics) the author of the TM thought interesting. Some more details can be found at the page we had made earlier.

What is a UCD Topic Map?

Different catalogs and tables have different column names. Sometimes two authors denote by different names the same quantity (e.g. bmag or Bmag for the B magnitude) and sometimes the same name is used for different quantities. UCDs, or Uniform Column Descriptiors bring about uniformity and standardization by assigning to all columns names from a fixed set of standard names - UCDs. To allow for subquantities and variations in the quantities described, the UCDs are structured hierarchically. The UCD TM takes advantage of the uniformity brought about by UCDs in order to connect columns of different catalogs. A TM needs topics to work on. In the present case, five basic topics along with their internal associations are chosen. These are:

  1. UCDs: This is just the standard list and contains 1409 names.
  2. Tables: Any set of tables can be used to form a Topic Map. We have currently used one with 100 most popular ones as per Vizier usage. But, as will be described later, users can make their own Topic Maps using any set of tables.
  3. UCD description: Each UCD has a description associated with it. These descriptions are topics too.
  4. Column description: UCDs corresponding to each table are associated with column names from the table.
  5. units: The columns are often associated with units for the quantities they hold.

Many associations can be described:

and so on.

Besides these, there are pointers (external links) to the actual full tables, possibilities of merging different columns and pointers to plotting possibilities within a catalog. All these topics, their associations, and occurences together form a layer of hyperlinks. That is the UCD Topic Map.

How does one move around inside the map?

The UCD TMs are essentially Data Discovery Tools. At the present time we have made UCD TMs using a few sets of tables. One of them is using the 100 most frequently used catalogs from Vizier. It can be found at UCD Topic Map. This is how a snapshot of the main page looks:

Clicking on table in the subject index will show you the list of tables used :
.

Clicking on any of the individual tables takes you to the page containing a list of UCDs, column descriptions, UCD descriptions, and Units present in that table:

Similarly clicking on UCD will take you to the list of UCDs :

Of all the UCDs that you see present in these 100 catalogs, if you click on any, you are taken to a page where that UCD is the main topic. Among other links, the page contains a list of the tables in which the UCD occurs. If you choose, as an example, the 12 micron IR flux density: PHOT_FLUX_IR_12 , You find that it occurs in several tables under different column headings (e.g. C, F12, F12umEst, FLUX12, Fnu_12), but all with unit Jy (which incidentally, as you can discover by clicking on Jy on that page, is used for Radio and IR fluxes):

Similarly, PHOT_MAG_R takes you to:
.

In short, we find that because of the associations we have used, all the various topics are well interlinked and we are ready to start asking real questions.

What is the map capable of?

Where is all this getting us? Basically two places. Using this mechanism it is easy to figure out which catalogs hold your columns of interest, and then, using the column description and units, figure out if those catalogs can be meaningfully merged i.e. within a single UCD Topic Map as that made of the 100 most popular catalogs, one can already look for interesting traits by asking differnet questions like: are there IR related columns which we could possibly merge with X-ray related columns in some other catalogs? Further, are both these catalogs extra-galactic? Or are they likely to provide me with a way of identifying galactic IR contaminants? and so on and so forth. By asking such questions and bringing up appropriate topics, one can output the (names of) columns from different tables that one would like to combine. These can then be fed to one of the other VO services (like the toy model by Roy Williams) to provide an actual merge. Topic Maps themselves may be stretched to do this by figuring out positional accuracies after necessary coordinate precession and taking into account the various error ellipses, but that part is best left to a specialized service.

The more exciting possibility is of course to merge (the topics, associations and occurrences of) your own table with any of the tables available in the Topic Map. A cgi-bin program allows you to make a Topic Map out of your own table (provided it has the UCDs and is in a standard format) and merge it with any other Topic Map. Topic naming convention assures that identical topics (defined through subjectIdentity) are merged.

A good test case will be to get a list of X-ray tables (or a smaller list of X-ray tables) and make a UCD Topic Map out of it by passing it to the CGI program. One can then explore it separately and then also merge it with one of the other UCD TMs to explore interrelationships between these sets of tables.

Note the External Resources links when you display some topics (e.g UCDs and tables). These point to resources that do not reside within the Topic Map, but are related to the current Topic. One can provide any number of links per topic here. For a table it currently points to the table on Vizier, and for UCDs the multiple links point to all the tables at Vizier which contain that UCD. Note that the TM itself deals with just the metadata of the catalog, however, it can pass on as arguments the column numbers which the external links can pick up and do all sorts of interesting gimmicks. As a result, the following can be easily provided:

The possibilities are limitless.

Is that all?

By no means. Earlier we stated that Topic Maps are to XML files what SQL is to databases. Well, we were overstating it there. One of the powerful aspects of SQL is its ability to query. Without some kind of querying capability, TMs will be that much lamer. A full blown Topic Map Query Language (TMQL) is being developed. But until then, we do have the capability to execute queries using the syntax that combines the good qualities of SQL and Prolog:

Additionally, the Topic Map can also be indexed. The text input box seen at the top accepts text to search for. The results are returned with a significance number attached. The query and search facilities make the Topic Map very versatile.

What exactly does the Topic Map generator do?

The input to the cgi-bin program that generates the Topic Map is a list of tables. The XML files corresponding to these catalogues contain UCDs along with the field names i.e. the metadata. If locally present, these files are directly used. Else they are fetched from Vizier using wget and the URL scheme: where B/hst is one of the catalogs (HST).

Some of these catalogs actually have a set of catalogs associted with them. It is first determined if the catalogs are solitary or in a bunch. If the later a set of xml files with the subcatalogs are similarly obtained and combined with the solitary catalogs. (Catalogs without proper UCD structure can be rejected).

UCDs and their corresponding column names for each catalog are then obtained and a single (Xtended) Topic Map document NAME.xtm is formed and moved to the directory where the Topic Map server looks for Topic Maps.

But how is this any different or better than all the other services out there?

Most services deal with single catalogs. Of the others, most assume certain types of associations in the data mixing that they provide e.g. image overlay, finding charts etc. The UCD TM provides you with a resource which you can go on exploring to look for whatever hidden connections are out there. Moreover, you can always make a modified Topic Map of your own to incorporate any of your hunches and explore further. You can even make entirely different TMs based on this data trivially. TMs are more than interactive. They are alive.


Please browse and provide comments.

Ashish Mahabal
Last outdated on: Aug 18, 2002