MarBlog

Data Management

The Search Continues: Simple ontology/Controlled vocabulary

by on Sep.14, 2010, under Data Management, Software

As part of building a framework model for managing a rather varied collection of data, I’m seeking to develop a controlled vocabulary of activities to help facilitate better searching. However, I’ve now spent the best part of today rooting around in a long string of very impressive ontology managers that all seem quite capable of going the full mile if you want to develop semantic web applications. But I’m still coming up short on something a bit simpler – construction of a simple controlled vocabulary based on a relatively straight forward taxonomy. Maybe its me – still getting my head around the finer workings of semantics – pun intended :-)

Leave a Comment :, , more...

Integration in Taxnomy.

by on Aug.26, 2010, under Data Management, Marine Life, Online Data Sources

My background is in zooplankton ecology before walking down the data management route. As such i still keep an eye on things in this area, and the recent report from the ICES Study Group on Integrated Morphological Taxonomy (SGIMT) has released their report, wherein the recommendation is put forward to standardise marine taxnonomic nomenclature. No big surprises there – it is an area that we’re all too familiar with – there are lots of areas where things should be controlled better – but you have inherited a system with 10,000 old versions of names and there is neitehr the time nor the money to update it all.

However, the World Register of Marine Species (WORMS) include a Taxon Match facility, which will match up your list of species names with the authorative list and provide you with additional reference information, including ITIS codes (TSN), Aphia ID, authorities, Kindum, phyla etc. which gives you a good chance of restruturing and updating older lists which may have drifted. I tried it out on a list of approximately 7,000 species of phyto- and zoo-plankton (although you have to break it down into chuks of maximum 1500 records in a single match), and generally got about 60-70% match. It’s pretty nice to have clear up nearly 5,000 records for an hours work rather than a long and painful serach of each individual line.

Leave a Comment :, , , , more...

Getting started on INSPIRE

by on Jul.28, 2010, under Data Management, Legal, Online Data Sources

The EU INSPIRE Directive places very significant amounts of work on public authorities to make their spatial data available and harmonised. The steps to achieving INSPIRE compliance and making data available for re-use can seem like a long and ardeous task (and likely will be in many cases) – especially if you start out by reading the documentation hosted on the INSPIRE website. It’s not that the documentation is bad – it is very exhaustive, and perhaps a little too exhaustive for people who are just beginning to dip their toe and find out what needs to be done.

Enter the UK Location Programme’s new “Getting Started” page with a series of short guides that gives an overview of what is required on the data manager level to publish location information. Obviously these guides are styled directly to publishing directly to the UK Location programme and the data.gov.uk portal – but even if you are from another country or a devolved administration – these guides provide some useful information about what needs to be done.

Currently there are four guides published, with another six planned or in production, and if you are feeling a little set back by the magnitude of INSPIRE – this is a good place to start.

2 Comments more...

Rules about privacy

by on Jun.29, 2010, under Data Management, Online Data Sources

This transcript of a talk at www2010 introduces some good summary rules about privacy in data. The talk focusses on “Big Data”, which relates mostly to socil, market and customer data in this context. However, every organisation or public authority looking at releasing data, sharing data or simply just managing it will benefit from observing some interesting points. I have picked a quote for each point to illustrate the consideration:

Security through obscurity is a reasonable strategy 

You may think that they shouldn’t rely on being obscure, but asking everyone to be paranoid about everyone else in the world is a very very very unhealthy thing. 

Not all publically accessible data is meant to be publicised

Some may hope that their content is widely distributed, but many more figure that it will only be consumed by the appropriate people

People who share PII aren’t rejecting privacy

Too many people working with Big Data assume that people who give out PII want their data to be aggregated and shared widely.

Aggregating and distributing data out of context is a privacy violoation

Context still matters.  It shapes the data that’s produced and what people’s expectations are.

Privacy is not access control

We have a long history of thinking of content as public or private, of representing privacy through numerical sequences like 700.  But this collapses two things: privacy and accessibility.

..And publicity twists it all

 Just because we can aggregate and redistribute data, should we?  

Now, these topics, presented by Danah Boyd, obviously focusses on Social networks and data with lots of user information. But most of the topics scale quite well to marine and scientific data in general. THe context changed slightly, and often the question of privacy is less demanding – although it still needs to be considered. Often the privacy related matters can almost be equated to ensuring quality of the data, and respecting the scientists who have collected the data.

There is currently big drives in legislation to push data into the public. I for one welcome this, but the question of suitability and publishing the data does raise questions. To which level of granularity do we actually publish the data and at what level is it enough that they are available on request. I would like to think that most data can be published, provided documentation and quality can be determined, but there is of course the question of obscurity. How obscure are the data, and is the effort required to make it publically searchable, downloadable, viewable, etc. justifiable if it is only a limited group of experts that are going to be looking at it anyway?

Leave a Comment more...

Today’s Reading

by on Apr.29, 2010, under Data Management, Legal, Online Data Sources

European Commission Report on Legal Aspects of Marine Environmental Data.

Framework Service Contract No. FISH/2006/09 – LOT2

Perhaps a little outdated, being from October 2008, considering the implementation of INSPIRE, but so far i’m finding it a useful and extensive resource.

Update: The INSPIRE legislation itself is reasonably well covered within the report, and has a lot of relevant information. HOwever, the national implementation plans are typically only surfacing now, so is not considered.

Leave a Comment more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Blogroll

A few highly recommended websites...