When we started work on the UCD Topic Map, we were thinking of being able to merge appropriate columns from different catalogs using the UCDs which so uniformly describe the columns.
However, before that can happen, many small issues have to be taken care of. Here we present a road map of the procedure with the steps that have been completed so far. We have used the free tool Omnigator from Ontopia to explore Topic Maps. Many snap shots will be seen on this page. Clicking on the thumbnails will take you to full-size images.
A brief reintroduction to topic maps
A Topic Map is to XML what SQL is to
databases. Thus, TMs are the tools to play around within the XMLspace.
What do TMs have? They have topics, and relationships between topics (called
associations, generally binary), and occurrences of the topics in different
contexts (called scopes) and so on. A TM then is a network of hyperlinks overlaid on an
information resource connecting items (columns, topics) the author
of the TM thought interesting. Some more details can be found at the
page we had made earlier.
What is a UCD Topic Map?
Different catalogs and tables have different column names. Sometimes two
authors denote by different names the same quantity (e.g. bmag or Bmag for
the B magnitude) and sometimes the same name is used for different quantities.
UCDs, or Uniform Column Descriptiors bring about uniformity and standardization
by assigning to all columns names from a fixed set of standard names - UCDs.
To allow for subquantities and variations in the quantities described, the
UCDs are structured hierarchically. The UCD TM takes advantage of the
uniformity brought about by UCDs in order to connect columns of
different catalogs.
A TM needs topics to work on. In the present case, five basic topics
along with their internal associations are chosen. These are:
Many associations can be described:
Besides these, there are pointers (external links)
to the actual full tables, possibilities
of merging different columns and pointers to plotting possibilities within
a catalog. All these topics, their associations, and occurences together form
a layer of hyperlinks. That is the UCD Topic Map.
How does one move around inside the map?
The UCD TMs are essentially Data Discovery Tools.
At the present time we have made UCD TMs using a few sets of tables.
One of them is using the 100 most frequently used
catalogs from Vizier. It can be found at
UCD Topic Map.
This is how a snapshot of the main page looks:
Clicking on table in the subject index will show you the
list of tables used
:
Clicking on any of the individual tables takes you to the page containing a list of UCDs, column descriptions, UCD descriptions, and Units present in that
table:
Similarly clicking on UCD will take you to the
list of UCDs
:
Of all the UCDs that you see present in these 100 catalogs, if you click on any,
you are taken to a page where that UCD is the main topic.
Among other links, the page contains a list of the tables in which the UCD occurs.
If you choose, as an example, the 12 micron IR flux density:
PHOT_FLUX_IR_12
,
You find that it occurs in several tables under different column headings (e.g. C, F12, F12umEst, FLUX12, Fnu_12),
but all with unit Jy (which incidentally, as you can discover by clicking
on Jy on that page, is used for Radio and IR fluxes):
Similarly, PHOT_MAG_R takes you to:
In short, we find that because of the associations we have used, all the
various topics are well interlinked and we are ready to start asking
real questions.
What is the map capable of?
Where is all this getting us? Basically two places. Using this mechanism it is
easy to figure out which catalogs hold your columns of interest, and then,
using the column description and units, figure out if those catalogs can
be meaningfully merged i.e. within a single UCD Topic Map as that made of the 100 most popular catalogs, one can already look for interesting traits by asking
differnet questions like: are there IR related columns which we could possibly
merge with X-ray related columns in some other catalogs? Further, are both
these catalogs extra-galactic? Or are they likely to provide me with a way
of identifying galactic IR contaminants? and so on and so forth. By asking such questions and
bringing up appropriate topics, one can output the (names of) columns from different
tables that one would like to combine. These can then be fed to one of the
other VO services (like the toy model by Roy Williams) to provide an actual
merge. Topic Maps themselves may be stretched to do this by figuring
out positional accuracies after necessary coordinate
precession and taking into account the various error ellipses, but that part is best left to a specialized service.
The more exciting possibility is of course to merge (the topics, associations
and occurrences of) your own table with any
of the tables available in the Topic Map. A cgi-bin program allows you to make
a Topic
Map out of your own table (provided it has the UCDs and is in a standard
format) and merge it with any other Topic Map. Topic naming convention
assures that identical topics (defined through subjectIdentity) are merged.
A good test case will be to get a list of X-ray tables (or
a smaller list of X-ray tables) and make a UCD
Topic Map out of it by passing it to the CGI program. One can then explore
it separately and then also merge it with one of the other UCD TMs to explore
interrelationships between these sets of tables.
Note the External Resources links when you display some topics (e.g
UCDs and tables). These point to resources that do not reside within the Topic Map, but are related to the current Topic. One can provide any number of
links per topic here. For a table it currently points to the table on Vizier, and for UCDs the multiple links point to all the tables at Vizier which contain that UCD.
Note that the TM itself deals with just the metadata of the catalog, however,
it can pass on as arguments the column numbers which the external links can pick
up and do all sorts of interesting gimmicks.
As a result, the following can be easily provided:
Is that all?
By no means. Earlier we stated that Topic Maps are to XML files what SQL is to
databases. Well, we were overstating it there. One of the powerful aspects of SQL is its ability to query. Without some kind of querying capability, TMs will
be that much lamer. A full blown Topic Map Query Language (TMQL) is being
developed. But until then, we do have the capability to execute queries
using the syntax that combines the good qualities of SQL and Prolog:
Additionally, the Topic Map can also be indexed. The text input box seen
at the top accepts text to search for. The results are returned
with a significance number attached. The query and search facilities
make the Topic Map very versatile.
What exactly does the Topic Map generator do?
The input to the cgi-bin program that generates the Topic Map
is a list of tables.
The XML files corresponding to these catalogues
contain UCDs along with the field names i.e. the metadata.
If locally present, these files are directly used. Else they are fetched from
Vizier using wget and the URL scheme:
Some of these catalogs actually have a set of catalogs associted with them.
It is first determined if the catalogs are solitary or in a bunch. If the later
a set of xml files with the subcatalogs are similarly obtained and combined
with the solitary catalogs. (Catalogs without proper UCD structure can be
rejected).
UCDs and their corresponding column names for each catalog are then obtained
and a single (Xtended) Topic Map document NAME.xtm is formed and moved to the
directory where the Topic Map server looks for Topic Maps.
But how is this any different or better than all the other services out there?
Most services deal with single catalogs. Of the others,
most assume certain types of associations in the data mixing
that they provide e.g. image overlay, finding charts etc. The UCD TM provides
you with a resource which you can go on exploring to look for whatever
hidden connections are out there. Moreover, you can always make a modified
Topic Map of your own to incorporate any of your hunches and explore further.
You can even make entirely different TMs based on this data trivially. TMs
are more than interactive. They are alive.
What is a UCD Topic Map?
How does one move around inside the map?
What is the map capable of?
Is that all?
What exactly does the Topic Map generator do?
But how is this any different or better than all the other services out there?
Reiterating what a Topic Map is
and so on.
.


.
The possibilities are limitless.
instance-of($A,ucd)?
lists all UCDs.
select $A, count($B) from isParentUCD($A : ucd, $B : ucd)?
select $A, count($B) from isParentUCD($A : ucd, $B : ucd) order by $B desc?
Please browse and provide comments.
Ashish Mahabal
Last outdated on: Aug 18, 2002