Nick Nicholas from ANDS makes some excellent suggestions on what institutions should consider when deploying VIVO/ (VITRO) as a metadata store:
If you’re going to deploy VITRO,
1. Ensure that you’re getting timely feeds of information about parties, activities, and collections from the various sources in your institution.
VITRO needs to act as an aggregator for information about parties, activities and collections, which is held in different sources in an institution. Aggregating this information is a challenge for all metadata stores solutions: you will need to get up to date information out of HR (parties), the Research Office (activities), the Library (collections), and departments and research centres (collections), in formats which can be processed into VIVO triples. Moreover, you will need consistent Linked Data persistent URIs for all these entities, so that relations between entities can be represented through VIVO (and translated into RIF-CS).
VITRO has come from the US university system, to address the fragmentation of information sources within universities there; so it is set up to deal with a diversity of sources, and supports both manual and automated feeds, including Excel and CSV: http://www.vivoweb.org/data-ingest-guide. But it is in your institution’s interest to set up workflows to ensure data is kept up to date.
In a few instances, institutions already have feeds into a data warehouse set up to aggregate this information; VIVO then needs merely be a layer on top of that warehouse. But if you will be using VIVO to serve as your data warehouse, you will also need to build the feeds into VIVO. Often, there is a quite large number of sources of truth which you will need to reconcile — a dozen is not unheard of.
2. Have a persistent URI, Linked Data infrastructure
The expectation in semantic web usage is that the URIs you use to identify entities (collections, organisational units, etc) are persistent, and resolve to something sensible. The expectation for the semantic web is that what the URIs resolve to can be machine readable, but should also be human readable (by default with content negotiation, to deal with both RDF and HTML).
The infrastructure to make the Linked Data URIs resolvable and human readable is provided by the VITRO software, but you still need to decide on the namespace for your URIs. Best practice is *not* to use an opaque identifier like Handle; opaque identifiers insulate the user from context and meaning changes around entities, but the point of a semantic web identifier is to code meaning. It is also recommended that you keep things simple by using a distinct web site for your entities, such asprojects.myuni.edu.au, which can be a namespace distinct from www.myuni.edu.au
3. Have some inhouse capability to deal with the semantic web
RDF is not particularly difficult; you are using VIVO which is an existing well-defined ontology; and RDF has some clear advantages over relational databases in managing information flexibly. But in taking up VIVO, you are making a commitment to maintain semantic web infrastructure. You should be getting value out of this infrastructure, not just to get metadata to ANDS, but also for your own internal tracking of your research outputs.
This means that you need to be able to formulate RDF queries to your VITRO instance locally, and to interpret the results you get. You will need to know your way around SPARQL (the RDF query engine).
http://www.vivoweb.org/files/Implementat… details what resourcing you will need locally to run VIVO effectively across an institution: read it carefully. You will need ongoing positions for a programmer/analyst, to do the RDF work, an information manager, and a trainer/outreach person.
4. Be a good citizen of the Semantic Web
By using VIVO, you are joining in to the Linked Data vision of navigable, well-defined, openly accessible data online. You have a responsibility to keep the data you publish in that way persistent, up to date, and well-curated. That includes updating descriptions to reflect updates in VIVO, and the other ontologies used in VIVO.
It also means participating in the ANDS VITRO community, to ensure that you address your metadata requirements in a way compatible with the rest of the community, and feed any particular issues you are having to the group.
If you do need to describe something out of the current scope of ANDS VITRO—whether in public or private data—you should be using well-established public ontologies, rather than reinventing data representation locally.
Finally, you should be responsive to users outside your institution, who want to make use of the data you have made available in this way.
Thanks Nick!