Author Archive
BODC’s New data catalogue
by Jens on Nov.24, 2010, under Online Data Sources
British Oceanographic Data Centre has jsut announced a new facility on their website to search and retrieve data series directly from the web. While a lot of data could be retrieved before, this catalogue truly opens up access across all categories and project, with over 76,000 data series being put online in a searchable format. The series are mainly CTD casts, but also include bathymetry meterology, optical properties, wave data and more.
The great thing is that data is available in several recognised formats, NetCDF, ODV and ASCII files – so virtually everyone in the field can access this data in a preferred format.
There are some limitations in terms of the way you can refine searches, but most of them makes sense from the perspective of optimising searches and not hanging up the server in searches that return virtually everything.
By the time you have narrowed your search criteria to return 1,000 series or less, you can retrieve results. There’s the option of downloading a KML file of coverage, and you can retrieve data in your preferred format.
It is important to note that we’re talking data series, not individual points here, so even a single series can contain thousands of data points, giving you access to a seriously large amount of oceanographic data with a wide geographic coverage.
The initial map on the start page show waters around Britain, but make sure you either zoom out or pan around as there is data from a much wider region – virtually all of thw world - than what is shown on the map.
You do have to register an account with BODC in order to checkout your “data shopping”, but there is a huge amount of data freely available. The map tells you up front which data series are freely available.
BODC has truly made their data a lot more accessible with this exercise.
Ontology Part 4: Digging a bit deeper
by Jens on Nov.03, 2010, under General, Online Data Sources
- Syntactic Challenges – e.g. different models and languages
- Schematic Challenges – e.g. structural differences
- Semantic differences – e.g. different meanings and understandings.
Get INSPIRE’d with new presentations
by Jens on Oct.12, 2010, under Data Management, Legal, Online Data Sources
As the deadlines for the first batches of INSPIRE Annex I metadata is approaching in December, people are waking up to the work ahead. The UK Location Programme has been working hard at developing solutions, and preparing information for anyone needing to meet the INSPIRE directive in making spatial data available and accessible.
The latest culmination was a Data Providers workshop, where a set of presentations laid out what will be expected in terms of delivering data, and what tools are going to be made available to help thise work. The presentations are now online, and I’d recommend anyone finding themselves having to publish spatial data under INSPIRE to have a look at these – they are a great introduction to the broad concepts, while there is also a good amount of detail for the more technically minded – helping to make decisions on the use of these tools or other services.
Ontology Part 3: Sharing it
by Jens on Sep.22, 2010, under Data Management, Online Data Sources, Software
In the last couple of posts I have been talking about ontology tools. In the meantime, I have been working a bit with Protege, getting a basic skeleton up (50 entitites/classes, similar number of instances, and a handful of object relations, and some data fields associated). Now, however, there is a point where I need to start sharing it with a group of colleagues. Not everyone in this group will be au fait with running protege and delving into the bowls of OWL files.
At first I looked at the simple exporter function, OWLDoc that will drop your ontology in plain HTML and which will probably end up being the basic option for starters. It isn’t pretty without some css work done to it at least, but it still saves you explaining aspects of new software to someone who really should be contributing their knowledge and expertise about the subject in the ontology, not become full time editors.
There is also a neater Apache Tomcat servlet, ontology-browser, which seems to work very nicely. The slight caveat there is that I don’t think we have a spare server lying around for running it on. Remote hosting could be an option, but is not ideal, given some of the nature of information that may end up in the ontology.
Finally, i stumbled across a presentation on SlideShare, talking about implementation of semantic import into Drupal. This has gotten me rather excited as I am a long time Drupal user, and like the extensibility of the system. There’s already ideas buzzing around on the potential power of combining importable ontologies directly with web-based presentation material of different instances. But I still have it to try, and it raises some similar issues in terms of server to the ontology browser – but still thought I would share the presentation here
Ontology Part2: Homing in
by Jens on Sep.15, 2010, under Data Management, Software
After trying out a few ontology tools, I am slowly settling on Protege which seems to have a decent mixture of straight forward entity building with additional support for tracking object properties, data properties and individuals associated with classed . Given that this exercise is primarily about building a catalogue of activities whereby data from several underlying systems can be identified based on the relative ease with which it can be expanded is important.
Most of the ontology tools I have looked at are java based. This of course have the advantage of making them more distributable across operating systems, but can also have some quite severe memory implications – but so far Protege seems to work well without any signs of slowdowns.
In addition there are a number of plug-ins, which I am just starting to look into, but the ability to create forms based on the data definitions using the frames extension seems promising.
So at least for now, Protege seems like the choice for building a relatively simple ontology while keeping it scalable with the possibility of associating data directly.
The Search Continues: Simple ontology/Controlled vocabulary
by Jens on Sep.14, 2010, under Data Management, Software
As part of building a framework model for managing a rather varied collection of data, I’m seeking to develop a controlled vocabulary of activities to help facilitate better searching. However, I’ve now spent the best part of today rooting around in a long string of very impressive ontology managers that all seem quite capable of going the full mile if you want to develop semantic web applications. But I’m still coming up short on something a bit simpler – construction of a simple controlled vocabulary based on a relatively straight forward taxonomy. Maybe its me – still getting my head around the finer workings of semantics – pun intended
Integration in Taxnomy.
by Jens on Aug.26, 2010, under Data Management, Marine Life, Online Data Sources
My background is in zooplankton ecology before walking down the data management route. As such i still keep an eye on things in this area, and the recent report from the ICES Study Group on Integrated Morphological Taxonomy (SGIMT) has released their report, wherein the recommendation is put forward to standardise marine taxnonomic nomenclature. No big surprises there – it is an area that we’re all too familiar with – there are lots of areas where things should be controlled better – but you have inherited a system with 10,000 old versions of names and there is neitehr the time nor the money to update it all.
However, the World Register of Marine Species (WORMS) include a Taxon Match facility, which will match up your list of species names with the authorative list and provide you with additional reference information, including ITIS codes (TSN), Aphia ID, authorities, Kindum, phyla etc. which gives you a good chance of restruturing and updating older lists which may have drifted. I tried it out on a list of approximately 7,000 species of phyto- and zoo-plankton (although you have to break it down into chuks of maximum 1500 records in a single match), and generally got about 60-70% match. It’s pretty nice to have clear up nearly 5,000 records for an hours work rather than a long and painful serach of each individual line.
Information design
by Jens on Aug.24, 2010, under General
Yet another TED Talk plugged! This time it is a slightly longer talk around the concepts of visualising large amounts of data in a manner that enhances understand – or information design for short.
The talk is by David McCandless, data journalist showcases different aspects of visualising multiple data sets in a single info-graphic. Refreshingly, he also talks about caveats in the ways of aggregating and considering the visual data – essentially encouraging a critical sense in both the journalist fields creating such info graphics, and raising awareness in the consumer at the same time.
There is little doubt that complex tasks of aggregated, and visualising larger and larger data sets in an area of increasing attention – and it is nice to see some of the potential output – even if it does not focus on marine data
OBIS Seamap
by Jens on Aug.23, 2010, under Marine Life, Online Data Sources
OBIS (Ocean Biogeographic Information System), originally established under Census of Marine Life is animpressive alliance of people working to make biogeographic data available. As a whole, they now hold well over 27 million records and 849 data sets, which are accessible through the portal.
However, I thought I’d like to highlight a particular aspect – the OBIS Seamap. It includes observations on marine mammals, seabirds and sea turtles, as well as accessing environmental variables. In addition, there are links to a wide range of tools and additional databases ranging from photographic fin matching to sea turtle nesting sites.
However, it is the functionality of this site that is really impressive. Beyond the ability to search through over 2 million observations by data set, species, locations etc etc. you are also able to extract all the relevant information directly to freely available mapping tools such as google earth (export a kml file to work on, based on your search results) as well as OGC compliant formats for web mapping or file services. Altogether, the strong presentation of data sets combined with a well laid out and thought out set of functionalities demonstrates a very competent site, which will hopefully serve as inspiration to others looking to publish large volumes of marine data online.
Death of the web and all things in it..again
by Jens on Aug.19, 2010, under General
It would seem jsut about every conceivable part of the web, and associated technology has been declared dead according to the collection over at Technologizer
I guess working with data management – it is something that we’re used to. Large Projects are declared dead before they hit the streets and the tech is outdated before the final user acceptance testing has rolled through. But obviously, in spite of various obituaries of web and projects – things still chug along.