VIVO mapping document

I have been meaning to upload for a while our VIVO mapping document that we used to develop our feed to VIVO.

For public consumption, I have removed our table definitions, however I think it should provide a useful template.

With huge thanks to Anna Morely for comming up with such a clear requirements format.

VIVO Mapping_Document-public

VIVO ANDS Ontology

The new ANDS VIVO ontology can be found here:

http://purl.org/ands/ontologies/vivo/

VIVO Community day – thankyou

Thankyou to everybody who participated in the VIVO Community day.

Feedback from the day has been that everybody found it useful to be exposed to the broader VIVO movement vision. It was great to get perspectives from multiple institutions on VIVO implementation challenges – a highlight being a demonstration of Griffith University’s VIVO research hub.

A second highlight was the willingness in the Australian community to collaborate on improvements to using VIVO as a metadata store and a recognition of the strategic alignment between these efforts and work both in New Zealand, and Cornell University (DataStaR.)  A write up on the collaborative development workshop in the afternoon should follow shortly.

A special thank you to Brian, Huda, and Stephen for making the trip out to Australia and sharing their perspectives, and to Symplectic and ANDS for making it possible.

As promised, the slides from the morning’s presentation are now available:

VIVO Community Day Main Presentation

DataStaR Community Day Presentation

VIVO Community Day 6th February 2012

On 6 February 2012, the University of Melbourne will be hosting a VIVO Community Day, sponsored by Symplectic and the Australian National Data Service.

The community day will be an opportunity to get an overview of VIVO developments, as well as enagage with senior VIVO developers from Cornell University and Florida University. Cost is free.

Registrations for this even can be made at : http://vivocommunityday.eventbrite.com/

Draft Agenda

Location: Level 1 Alan Gilbert Building, (corner of Barry and Grattan Streets)

Morning

Welcome

Towards building an international Network of Researchers, the VIVO vision

VIVO the Application

VIVO the Ontology

VIVO Development Road Map

VIVO Collaboration Initiatives

Eagle I, an Ontology for Research Infrastructure

  • EuroCRIS
  • Harvard Catalyst Profiles
  • ORCID

Getting the most out of VIVO – bells whistles, and internal integration

  • VIVO Visualizations
  • VIVO website plugins
  • VIVO mini grant report

Morning Tea

Round table reports: Sharing Our VIVO implementation Experiences

Getting data into VIVO: approaches
Using the VIVO Harvester/Symplectic Connectors
Selling/Promoting VIVO: Staff Profiling/ Data Management

Lunch

Using VIVO as a metadata Store
dataStaR
VIVO ANDS collaboration Agenda
VIVO at Griffith
Data Capture Integration techniques: QUT/Griffith/VeRSI

Afternoon Tea

Workshop: Working together: a collaborative development agenda for VIVO metadata store development

Workshop 2: Hands-on with VIVO

Workshop 3: Research Admin Systems and VIVO

Find an Expert to be re-implemented on VIVO

The University of Melbourne’s Research Systems Upgrade project has received approval to re-implement Find an Expert on the VIVO platform.

(Official announcement here: http://www.its.unimelb.edu.au/__data/assets/pdf_file/0010/480097/Find_an_Expert_on_VIVO.pdf )

About Find an Expert

Since its launch in 2007, Find an Expert has had a significant impact on the way research expertise is communicated at the University both internally, and externally to the media and boarder community. Web Statistics for Find an Expert highlight its impact:

  • Over 6.3 million pages have been accessed from Find an Expert by more than 1.8 million visitors
  • Find an Expert is accessed daily by the major Australian media outlets such as the Australian Broadcasting Corporation, News Limited, and Fairfax
  • Find an Expert has also played an important part in connecting researchers to the University community, with one sixth of its usage coming from on campus sources
  • Links into Find an Expert can be found across 200 separate university websites.

Increasingly Find an Expert is also being used as a researcher’s primary home page.

Implementation Timelines

Find an Expert is scheduled to be migrated to VIVO by March 2012. Before then, early releases will be opened up in parallel with the main Find an Expert site for user community feedback.

Development screenshots of what Find an Expert will look like on VIVO can be found here: http://dl.dropbox.com/u/29677457/VIVO13FAEScreenshots.pdf

Find an Expert will become a core part of the Research data Registry as it will provide the URIs for people, publications, and grants required for the registry.

SEMANTIC WEB for the WORKING ONTOLOGIST

In preparing to update the ANDS VIVO ontology,  I found the following book really useful.

SEMANTIC WEB for the WORKING ONTOLOGIST  http://workingontologist.org/)

Using inference to streamline data entry

In a world where we ask our research community to add an ever increasing amount of metadata to an ever increasing number of things, it makes sense to employ strategies that reduce the amount of metadata that we need to enter from scratch.

One strategy employed successfully in social networking applications is to use the information already known about you to create new knowledge that is probably/possibly true if only you would validate it.  A good example of this is the ‘people you may know’ functionality in both LinkedIn and Facebook.

If we know about a researcher’s publications and grants (and we have this information expressed in the VIVO Ontology), then we can use this information in research data workflows to reduce the amount of information that we need to enter manually.

If I know that a research data set that I am about to enter is associated with a particular publication, then by looking at the publication I can infer that:

  • The authors on the publications are probably the collectors of the data set
  • At least one of the departments associated with the positions of the authors  could also be listed as a collector
  • The subject areas associated with the publications, are probably also associated with the research data set

For a research dataset associated with a publication that has 10 authors and potentially 6 classification codes, this represents a significant reduction in data entry.

Using SPARQL, the query language used to query rdf triple stores, I can CONSTRUCT these probable statements

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX core: <http://vivoweb.org/ontology/core#>

CONSTRUCT {  <http:://data/myNewDataset> ands:hasCollector ?person;

ands:hasCollector ?org;

vivo:hasSubjectArea ?subjectArea.

}

WHERE  {

?publication core:informationResourceInAuthorship ?la.

?la core:linkedAuthor ?person.

?person core:personInPosition ?position.

?position core:positionInOrganization ?org.

?publication vivo:hasSubjectArea ?subjectArea.

}

Another set of useful inferences is between a grant and the research data that it produces:

CONSTRUCT {  <http:://data/myNewDataset> ands:hasCollector ?person;

ands:hasCollector ?org;

vivo:hasSubjectArea ?subjectArea.

}

WHERE  {

?grant  vivo:relatedRole ?role.

?role vivo:investigatorRoleOf ?person.

?grant  vivo:administeredBy ?org

?grant vivo:hasSubjectArea ?subjectArea.

}

We are currently working on a generic way to integrate both of these inference patterns into VIVO data entry workflows. What is really great about these patterns is that because they are constructed with reference to an externally defined ontology, they are perfectly suited to constructing inferences across systems.  Say I have some research data coming off a machine. If I know the uri of the associated grant in my institutions VIVO systems, then by sending the second query to a centrally provided SPARQL endpoint I can quickly provide much richer metadata to my actual data as it comes off the machine.

(I am really looking forward to a time soon when this is possible).

VIVO and VITRO

Throughout this blog, you will see VITRO and VIVO used interchangeably. This isn’t a deliberate ploy to confuse people, it is just that that Vivoweb team have used an underlying semantic software system called Vitro to build their VIVO system and deploy their VIVO ontology.

Latterly, it makes more sense to refer to building a metadata store on top of VIVO rather than VITRO, as the project made an early decision to integrate with the vivo ontology, and extend our work on top of the vivo releases.

More information on the underlying Vitro and its other uses can be found here: http://vitro.mannlib.cornell.edu/

Cornell, and the University of Florida upgrade their VIVO instances

Recently, both Cornell, and the University of Florida have upgraded their VIVO instances:
see:
 http://vivo.cornell.edu/

and
 http://vivo.ufl.edu/

The ANDS community should take particular note of the integration of equipment records in the Cornell version – yet another useful thing to connect with research data, and something that with not much effort could be represented as an ANDS service.

Some other features worth observing:

The rdf for each page can be downloaded

Cool network diagrams for researchers

A much improved navigation interface

On Deploying VIVO

Nick Nicholas from ANDS makes some excellent suggestions on what institutions should consider when deploying VIVO/ (VITRO) as a metadata store:

If you’re going to deploy VITRO,

1. Ensure that you’re getting timely feeds of information about parties, activities, and collections from the various sources in your institution.

VITRO needs to act as an aggregator for information about parties, activities and collections, which is held in different sources in an institution. Aggregating this information is a challenge for all metadata stores solutions: you will need to get up to date information out of HR (parties), the Research Office (activities), the Library (collections), and departments and research centres (collections), in formats which can be processed into VIVO triples. Moreover, you will need consistent Linked Data persistent URIs for all these entities, so that relations between entities can be represented through VIVO (and translated into RIF-CS).

VITRO has come from the US university system, to address the fragmentation of information sources within universities there; so it is set up to deal with a diversity of sources, and supports both manual and automated feeds, including Excel and CSV: http://www.vivoweb.org/data-ingest-guide. But it is in your institution’s interest to set up workflows to ensure data is kept up to date.

In a few instances, institutions already have feeds into a data warehouse set up to aggregate this information; VIVO then needs merely be a layer on top of that warehouse. But if you will be using VIVO to serve as your data warehouse, you will also need to build the feeds into VIVO. Often, there is a quite large number of sources of truth which you will need to reconcile — a dozen is not unheard of.

2. Have a persistent URI, Linked Data infrastructure

The expectation in semantic web usage is that the URIs you use to identify entities (collections, organisational units, etc) are persistent, and resolve to something sensible. The expectation for the semantic web is that what the URIs resolve to can be machine readable, but should also be human readable (by default with content negotiation, to deal with both RDF and HTML).

The infrastructure to make the Linked Data URIs resolvable and human readable is provided by the VITRO software, but you still need to decide on the namespace for your URIs. Best practice is *not* to use an opaque identifier like Handle; opaque identifiers insulate the user from context and meaning changes around entities, but the point of a semantic web identifier is to code meaning. It is also recommended that you keep things simple by using a distinct web site for your entities, such asprojects.myuni.edu.au, which can be a namespace distinct from www.myuni.edu.au

3. Have some inhouse capability to deal with the semantic web

RDF is not particularly difficult; you are using VIVO which is an existing well-defined ontology; and RDF has some clear advantages over relational databases in managing information flexibly. But in taking up VIVO, you are making a commitment to maintain semantic web infrastructure. You should be getting value out of this infrastructure, not just to get metadata to ANDS, but also for your own internal tracking of your research outputs.

This means that you need to be able to formulate RDF queries to your VITRO instance locally, and to interpret the results you get. You will need to know your way around SPARQL (the RDF query engine).

 http://www.vivoweb.org/files/Implementat… details what resourcing you will need locally to run VIVO effectively across an institution: read it carefully. You will need ongoing positions for a programmer/analyst, to do the RDF work, an information manager, and a trainer/outreach person.

4. Be a good citizen of the Semantic Web

By using VIVO, you are joining in to the Linked Data vision of navigable, well-defined, openly accessible data online. You have a responsibility to keep the data you publish in that way persistent, up to date, and well-curated. That includes updating descriptions to reflect updates in VIVO, and the other ontologies used in VIVO.

It also means participating in the ANDS VITRO community, to ensure that you address your metadata requirements in a way compatible with the rest of the community, and feed any particular issues you are having to the group.

If you do need to describe something out of the current scope of ANDS VITRO—whether in public or private data—you should be using well-established public ontologies, rather than reinventing data representation locally.

Finally, you should be responsive to users outside your institution, who want to make use of the data you have made available in this way.

Thanks Nick!