Issue 361: Recording an E41 in RDF
posted by Richard on 15/1/2018
Hi,
It's perhaps telling that I even have to ask this question at this stage in the game.
I'm not sure how to encode a person's name in RDF in a CRM-compliant manner. It's an E41 Appellation, and is linked to the person by a P1_is_identified_by property, I'm assuming. So far, so good.
However, it looks as though I have the choice of not stating that it is an E41, or of connecting the E41 to its string value via a property which is nowhere defined in the CRM:
freeukgen:b65432#born a crm:E21_Person;
crm:P1_is_identified_by "Light, Thomas Edward" .
or:
freeukgen:b65432#born a crm:E21_Person;
crm:P1_is_identified_by [
a crm:E41_Appellation;
{has-string-value} "Light, Thomas Edward" ] .
The CRM definition gives strings as examples of E41, which implies that the first form is acceptable. However, my instinct says that it is wrong to finesse the fact that it is an E41 in this way. If the E41 is to be expressed, as in my second form, I would welcome advice as to what the value of "{has-string-value}" should be.
Whichever approach is correct, I am struck by the absence of a primer which says, in straightforward terms, "this is how you encode CRM concepts in RDF". If it exists and I have simply missed it, please point me in its direction and I will spread the word ...
posted by Martin on 15/1/2018
Right. We have often discussed it, but I am not sure if we have written a guideline, and it is not in the right place, or if we have only exchanged e-mails about it.
I put is as an issue, in case its new. The point is that we cannot make rdf label a subproperty of p1.
posted by Richard on 16/1/2018
On 15/01/2018 19:52, Martin Doerr wrote:
> Right. We have often discussed it, but I am not sure if we have written a guideline, and it is not in the right place, or if we have only exchanged e-mails about it.
> I put is as an issue, in case its new. The point is that we cannot make rdf label a subproperty of p1.
More generally, I would argue that there should be clear guidance on the whole subject of "implementing an RDF instantiation of the CRM". I was very pleased with the guidance for recording dates which we recently worked on, and assumed that was just an outlier which had been missed up to now. If we are seriously expecting implementors to produce RDF solutions which embody the CRM, we must provide them with comprehensive and specific guidance - maybe a range of implementation options. In my understanding of it, the problem areas are mostly at the "sharp end" where the actual data comes in.
posted by Maria Theodoriou on 16/1/2018
Dear all,
As being very much involved with mappings to the RDF implementation of CRM I would benefit a lot from clear guidance on the whole subject of "implementing an RDF instantiation of the CRM" as Richard states.
In CAA2016 we presented some Methodological tips for mappings to CIDOC CRM and among others we (a.k.a. Martin) claim the following:
4.2 Common database fields: Appellations
The RDF class rdfs:label and CRM class E41 Appellation are alternative implementations for the same concept in RDF, a human-readable name for the subject. So, for simplicity, when mapping contemporary names into RDF, we suggest the use of rdfs:label tagged with a language attribute. The use of the E41 Appellation class is required only if there is need to assign some additional properties to the Appellation such as properties of use or attribution.
Instances of E41 Appellation “are cultural constructs; as such, they have a context, a history, and a use in time and space by some group of users.” and thus E41 Appellation is appropriate for historical names.
Since then, I got several times questions related to this issue and apparently there are a few ways to deal with it. One recent e-mail mentioned "we were advised to use E55_Type > P1_is_indentified_by > E41_Appellation > P3_has_note > E62_String" and I was asked if this is the way to go.
If I am not wrong, the different ways to approach this was the main (probably the only) incompatibility between the Helculaneum data and WissKI data in Tiblisi. George knows the details.
Looking forward to official guidelines,
posted by George on 16/1/2018
Variants of this issue do come up often with people really trying to implement and indeed the lack of a consolidated implementation guide, to my knowledge, leads to incompatible implementations and this undermines the integration and interoperability we want to support. So I too think it should be raised as an issue.
posted by Richard on 16/1/2018
On 16/01/2018 13:07, Maria Theodoridou wrote:
>
> Dear all,
>
> As being very much involved with mappings to the RDF implementation of CRM I would benefit a lot from clear guidance on the whole subject of "implementing an RDF instantiation of the CRM" as Richard states.
I have started an "issues with RDF" document, but on reflection it may be more constructive to make it into a first attempt at the guidance I am asking for. I'll spend this afternoon pulling together material which I can easily find (e.g. the introductory comments in the RDF Schema document), and see what questions that exercise answers.
> In CAA2016 we presented some Methodological tips for mappings to CIDOC CRM and among others we (a.k.a. Martin) claim the following:
>
> 4.2 Common database fields: Appellations
> The RDF class rdfs:label and CRM class E41 Appellation are alternative implementations for the same concept in RDF, a human-readable name for the subject. So, for simplicity, when mapping contemporary names into RDF, we suggest the use of rdfs:label tagged with a language attribute. The use of the E41 Appellation class is required only if there is need to assign some additional properties to the Appellation such as properties of use or attribution.
>
> Instances of E41 Appellation “are cultural constructs; as such, they have a context, a history, and a use in time and space by some group of users.” and thus E41 Appellation is appropriate for historical names.
>
I think the principle is valid, but rdfs:label is a property, not a class, so I think that "rdfs:label" should be replaced by "rdf:literal" (or possibly "rdf:plainLiteral"[1]) in the above text. The point I assume that Martin is making is that the value of a P1_is_identified_by property can be finessed into a string if you have nothing more interesting to say about that value.
> Since then, I got several times questions related to this issue and apparently there are a few ways to deal with it. One recent e-mail mentioned "we were advised to use E55_Type > P1_is_indentified_by > E41_Appellation > P3_has_note > E62_String" and I was asked if this is the way to go.
This is the sort of endless class-property-class-... chain which leads me to question whether the CRM is an efficient way of solving an RDF implementation. Using Martin's short-cut above, you could replace the last three elements of this expression by a string. (Unless, for example, you also want to say for example that the Appellation has an alternative form, in which case the full structure is required ... and useful.)
(E55_Type is another question: I would like to tease out how we implement in RDF its stated role of representing "concepts denoted by terms from thesauri and controlled vocabularies".)
> If I am not wrong, the different ways to approach this was the main (probably the only) incompatibility between the Helculaneum data and WissKI data in Tiblisi. George knows the details.
posted by Robert Sanderson on 16/1/2018
Hi Richard, all,
In linked.art, we of course ran into this issue too! We ended up going for rdf:value for *all* values in the model for consistency, lacking another way to associate values with Appellations and Linguistic Objects, in particular. For example, see http://linked.art/model/object/identity/#titles
We discussed using p3_has_note, as a predicate for capturing arbitrary values, but it runs into three major problems out of the gate:
- It’s for “informal description that have not been expressed in terms of CRM constructs” and Appellation / Linguistic Object are clearly CRM constructs
- It would ambiguous with other uses of P3 without P3.1, and RDF does not allow relationships on relationships. The cost of using reification to express a string is clearly far too expensive for p3 to work.
- E90 adds a use of P3 as “The property P3 has note allows for the description of this content model” … rather than the actual value of the resource.
What would make us change our minds and use a CRM construct instead of rdf:value? A has_value property that covered Appellations and Linguistic Objects… which would mean that it could be associated with E90 Symbolic Object, with a scope note saying that it is the set of “identifiable symbols” that makes up the resource.
posted by Jim on 17/1/2018
Richard and SIG members,
On 16/01/2018, Richard Light wrote [rest of thread snipped for brevity]:
“I have started an "issues with RDF" document, but on reflection it may be more constructive to make it into a first attempt at the guidance I am asking for. I'll spend this afternoon pulling together material which I can easily find (e.g. the introductory comments in the RDF Schema document), and see what questions that exercise answers.”
The recent flurry of conversation relating to the interplay of #cidocCRM and #RDF is most interesting and timely, both to me personally and, I believe, to the larger SIG mission of championing our model’s utility to those who are interested but hesitant to explore and adopt it in practice.
== On the "Big Picture" Community Level... ==
1. Richard, I would be very interested to see your working document mentioned above as soon as it is available and would love to be involved in its draft evolution as I would qualify as a highly-motivated non-expert reader with good writing/editing skills.
2. I know that this mailing list is very focused on the "tight" conversations of core and significant modeling issues and their resolution. Given that wrestling with "#cidocCRM in #RDF" is itself a gnarly domain that will likely engender its own level of detailed conversation, and given that the SIG is currently having an in-person meeting on current issues and future directions, might it be appropriate, via the energy and interest at the current meeting, to form a Working Group on this topic and spawn its own mailing list with a charter to explore this topic and come back to the full SIG with draft documents (e.g. the afore-mentioned "primer") and recommendations in response to its charter? If such a working group were to be formed, I would very much like to be involved.
Putting on my "marketing hat" for a moment, I believe that the better we address #cidocCRM in #RDF, especially in terms of practical and example-based documentation and learning materials, that this will be the most important initiative we can take at this time to advance the adoption of the #cidocCRM in deployed and new #LOD systems/collections.
Happy-Healthy Vibes to All and a Happy New Year,
-: Jim:-
www.researchgate.net/profile/Jim_Salmons
www.medium.com/@Jim_Salmons/ (my #CognitiveComputing/#DigitalHumanities articles)
P.S. As a postscript, I provide these comments with regard to my own personal learning and research experience...
== Optional on my Personal Interest in #cidocCRM & #RDF ==
At a personal level, some in the SIG know that I am a U.S.-based independent (and untrained) #CitizenScientist working my post-cancer #PayItForward Bonus Rounds to contribute my best efforts at the intersection of #DigitalHumanities and #CognitiveComputing. As a “software guy” I spent the bulk of my career as a Smalltalk developer and was particularly active during the initial wave of the software patterns movement. I was drawn to the #cidocCRM through my desire to apply ideas for metamodel-driven design of “self-descriptive executable model” frameworks from my prior Smalltalk work. I want to apply these ideas to my research that takes advantage of the emerging technology of graph databases. As a “pure OOP” Smalltalker, I had a “knee-jerk” reaction of disinterest in #RDF as its level of detail in notation reminded me too much of what we “pure OOPers” felt about the object-orientedness of C++ and Java.
I have been using Neo4j’s property graph database for my initial applied research but lately became disenchanted with it. As I surveyed my technology-provider options, I decided that my piqued interest in Linked Open Data warranted a reevaluation of #RDF and the available triple store products as a means to pursue my work in development of the MAGAZINE #GTS (ground-truth storage) format based on a #cidocCRM/FRBRoo/PRESSoo ontological “stack.”
I am now fully committed to redirecting my #cidocCRM-based research platform around #RDF (along w/ #TEI) primarily for these three reasons:
* I found Ontotext's GraphDB to be an excellent company and technology, both in its principal product and in its all-important documentation, self-driven learning resources, and its helpful tech support community.
* Once I was "bitten" by GraphDB, I began an intensive effort to come up to speed on #RDF through self-study and found the most incredibly-written and super-helpful book, "Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL, 2nd Edition" by Dean Allemang and James Hendler (book companion website http://www.workingontologist.org).
* My interest in software patterns led me to Pascal Hitzler (http://www.pascal-hitzler.de/) and the ODPA, the Association for Ontology Design & Patterns and their website at http://ontologydesignpatterns.org with associated Google group mailing list at this shortened URL https://goo.gl/x6MJjM. Through my initial involvement in this community, I am excited to note that I will be attending #us2ts, the 1st U.S. Semantic Technologies Symposium in early March in Dayton, Ohio. Of course I will be bringing my interest in ontology design patterns and the #cidocCRM to this event which is geared toward developing a North American cross-discipline semantic technologies research community. More information on this event is here http://us2ts.org/.
Finally, I am also pleased to note that as part of my #PayItForward Bonus Rounds I served on the Program Committee of #DATeCH2017 and my fellow cancer-survivor wife and I had two papers accepted for a poster at this event, a PDF of which is available here https://1drv.ms/b/s!AtML1v0eUlpEgoAJ_FH6CMU5luOUBA.
To those who read this optional postscript... another
posted by Richard on 17/1/2018
Jim,
Thank you for the encouragement. I have put the document in its current form at:
https://docs.google.com/document/d/1zCGZ4iBzekcEYo4Dy0hI8CrZ7dTkMD2rJaxa...
and it is editable by anyone with the link. As you'll see, there is little that is new in there (although there might already be things to argue about!), but there is the outline of a more substantive document. All suggestions and contributions gratefully received.
posted by Robert Sanderson on 17/1/2018
Here’s a quick addition …
The RDF representation uses the names of the classes and predicates in the URIs that identify them. This means that when the names change, the URIs change and this invalidates all of the previous uses. As the SIG considers only the number to be important, there is a mismatch of expectations around persistence and versioning.
Examples: E78_Collection versus E78_Curated_Holding and the recent thread about renaming translation_of.
posted by Simon Spero on 17/1/2018
On Jan 16, 2018 10:07 AM, "Richard Light" <richard@light.demon.co.uk> wrote:
I think the principle is valid, but rdfs:label is a property, not a class, so I think that "rdfs:label" should be replaced by "rdf:literal" (or possibly "rdf:plainLiteral"[1]) in the above text. The point I assume that Martin is making is that the value of a P1_is_identified_by property can be finessed into a string if you have nothing more interesting to say about that value.
Some brief RDF / RDFS / OWL notes:
1: The IRI rdf:Literal refers to the set of all possible concrete data values (e.g.
the real number [1]
the floating point value [1]
the temperature [1°C]
the string (sequence of characters) ['o','n','e'], or ['1']
a string with an associated natural language tag [<["one"] , ["en"]>] or [<["one"], ["de"]>]
the English word [one]
It is the top datatype in OWL, and can be used to restrict a property's range in rdfs; however it is usually possible to specify a more precise type.
2: RDF 1.1 removed the concept of Plain Literals (which were literals in an RDF document that had no specified datatype, and which may or may not have a language tag). The type rdf:PlainLiteral was introduced by the OWL working group (at a time when there was no RDF working group), which was mostly ignored when the RDF 1.1 working group was formed.
RDF 1.1 added a new datatype, rdf:langString, which (sort of) denotes the set of all strings with an associated language tag. A langString MUST have a non-empty language tag. PlainLiteral can be approximated as the union of xsd:string and rdf:langString.
The values of langString (and appropriate subset of PlainLiteral) are pairs of strings; there is an extra level of interpretation required to turn them into natural language utterances, but this can be as simple as displaying the string to a user. There need not be a valid interpretation (e.g. the string may not correspond to an utterance in the indicated language).
If the range of a property is intended to be interpretable as natural language utterances then langString (or a defined OWL datatype restricting PlainLiteral to have a non-empty language tag) is usually a good choice.
If a property has string values that do not correspond to a natural language utterance, then using a range of xsd:string is appropriate.
If a property can have values which are strings that may or may not have language tags, then PlainLiteral may be appropriate; however this does not distinguish between strings in an unknown or unspecified natural language, and strings which are Just Strings.
In situations like this it may be useful to define objects to serve as value holders. Doing so can also allow for more detailed restrictions in OWL (e.g. requiring the preferred label for a Concept in a given KOS to be unique for a given language).
3: rdfs:label is an annotation property, which means that it should be used to add metadata describing things in an ontology document, rather than the things the ontology is about. As a consequence of this, any rdfs:label assertions are completely ignored by OWL direct semantics ; there are only three axioms that can be used when defining annotation properties (subproperty, domain, and range). Even these are invisible to a direct semantics reasoner (though they can be used by editors and other tools).
4: Simple Literals... orz
posted by Gordon Dunsire on 18/1/2018
All
It is for this reason that the IFLA declaration of URIs for the FRBRoo extension to CRM drops the name, and uses only the notation:
http://metadataregistry.org/schemaprop/list/schema_id/94.html
posted by Phil Carlisle 18/1/2018
Hi all,
I agree that using the number alone as the identifier would be the way forward particularly with regards to the changing of the name of a class or property.
However this would only work if the domain/range and scope of the class or property remain the same.
There is at least one instance of a property in the CRM where the number has been retained but the context of the property has completely changed.
The property in question is P148.
In the CRM version 4.2.2 we had:
P148 is identified by (identifies)
Domain: E28 Conceptual Object
Range: E75 Conceptual Object Appellation
Subproperty: E1 CRM Entity. P1 is identified by (identifies): E41 Appellation
Quantification: many to many (0,n:0,n)
Scope note: This property identifies a name used specifically to identify an E28 Conceptual Object.
This property is a specialisation of P1 is identified by (identifies) is identified by.
Examples:
§ The publication „Germanisches Nationalmuseum (GNM), Fuehrer durch die Sammlungen” (broschiert), Prestl 1995 (E73) is identified by ISBN 3-7913-1418-1 (E75)
According to the appendix of CRM 5.1.2 as amendments to CRM 4.2.5 the property P148 changed to
P148 has been changed
BEFORE
P148 is identified by (identifies)
Domain: E28 Conceptual Object
Range: E75 Conceptual Object Appellation
Subproperty: E1 CRM Entity. P1 is identified by (identifies): E41 Appellation
Quantification: many to many (0,n:0,n)
Scope note: This property identifies a name used specifically to identify an E28 Conceptual Object.
This property is a specialisation of P1 is identified by (identifies) is identified by.
Examples:
§ The publication „Germanisches Nationalmuseum (GNM), Fuehrer durch die Sammlungen” (broschiert), Prestl 1995 (E73) is identified by ISBN 3-7913-1418-1 (E75)
AFTER
P148 has component (is component of)
Domain: E89 Propositional Object
Range: E89 Propositional Object
Superproperty of:
Subproperty of:
Quantification: (0:n,0:n)
Scope note: This property associates an instance of E89 Propositional Object with a structural part of it that is by itself an instance of E89 Propositional Object.
Examples: The Italian text of Dante’s textual work entitled “Divina Commedia” (E33) P148 has component The Italian text of Dante’s textual work entitled “Inferno” (E33)
In the document as amendments to CRM 5.0.3 we have, unbelievably, the following:
P149 is identified by (identifies)
It is decided to create a subproperty of P1 to connect E28 with E75 as follows
P149 is identified by: E75
Domain: E28 Conceptual Object
Range: E75 Conceptual Object Appellation
Subproperty of: E1 CRM Entity. P1 is identified by (identifies): E41 Appellation
Quantification: many to many (0,n:0,n)
Scope note: This property identifies an instance of E28 Conceptual Object using an instance of E75 Conceptual Object Appellation.
Examples: The German edition of the CIDOC CRM (E73) is identified by ISBN 978-3-00-030907-6 (E75)
In this instance if the URI http://www.cidoc-crm.org/cidoc-crm/P148 had been in use in any implementation based on CRM 4.2.2 the change in label, domain and range would not have been picked up by an automatic update.
Furthermore at no point would it have been obvious that all instances of http://www.cidoc-crm.org/cidoc-crm/P148, in the original meaning, should be replaced with http://www.cidoc-crm.org/cidoc-crm/P149
This may have been an oversight on the part of the CRM-SIG however I would strongly suggest that in future if the SIG want to change a property or class that they check with those system owners who’ve actually been using the CRM in the real world to ensure that these whims do not affect the smooth running of any current implementations.
If the aim of the CRM is to facilitate data exchange it would imply that each implementation should be able to rely on the properties and classes not changing their fundamental essence.
Re-use and re-assignment of numbers and labels is, to my mind, exceptionally bad practice.
In the 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting, the sig decided to merge the issue 361 and issue 363. The discussion will continue on issue 363. This issue is closed
In the 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting, the sig decided to merge the issue 361 and issue 363 since both are referred to encoding in rdf CIDOC classes and properties. The discussion will continue on issue 363. This issue is closed
Lyon, May 2108