Issue 528: Guidelines and Protocols for Translating CIDOC CRM

ID: 
528
Starting Date: 
2021-02-25
Working Group: 
4
Status: 
Open
Background: 

Posted by George on 25/02/2021

Dear all,

With the advent of CIDOC CRM 7.1, a new stable community version (aimed for ISO approval) of the CIDOC CRM is established. This is the occasion for the broader community wishing to implement the standard on a stable basis to invest and engage with a mature ontological specification and text. 

A key aspect of this work at the community implementation level is to render the standard in various languages so that it can be studied, appropriated and applied without linguistic barriers by different linguistic and cultural communities around the world.

Towards this end, the task of translation is key and an important intellectual process and product of the CIDOC CRM community in its own right. 

The formulation of open, transparent and regular protocols and processes for creating a translation would thus be a crucial groundwork to lay out in order to give the appropriate support and weight to the translation efforts of the CIDOC CRM semantic data community.

At present, a search of the website (using the website search tools) returns only one article regarding translation. It is an issue from 2002 (http://www.cidoc-crm.org/Issue/ID-58-how-to-organize-the-translation-of-the-model) on how to organize the translation of the CIDOC CRM. 

It would seem then that there is a need to pick up this issue again and address its various aspects (especially given the phenomenal growth of the CIDOC CRM uptake and the spread of its use to different linguistic communities around the world).

It seems prudent therefore to communallly create a formulation of guidelines for translation best practice and, separately, open and explicit protocols for submission and acceptance of CIDOC CRM translations, to be developed and put into action  by the community. 

The spirit of the guidelines and protocols should be to make a transparent space for engaging in this important work and understanding its relation to the overall CIDOC CRM community effort. It should aim to support existing translation efforts and provide an obvious, open and transparent path for additional translation efforts.

Of consideration for inclusion in these guidelines and protocols are the following topics:

Protocol for Starting an Official Translation

Who can start an official translation, are there any preconditions?

Protocol for Accepting an Official Translation 

What are the criteria for accepting a translation as official? 

When do the translated classes and properties pass into the serializations?

Is there recognition of the translating group in the serialization (for the respective translation element)

Recommended Tools for Supporting Translation

Are there any tools recommended for supporting translation? Any recommended methods?

Networks of Support (Community of Translation Projects)

The translation of the CIDOC CRM is the translation of an aimed for neutral ontological description of CH data. The translation of the standard requires a creative effort to understand and elucidate the conceptual objects specified in the ontology. Given the complexity of this effort involving philosophical, computer science and cultural heritage specific knowledge, the process can be quite challenging. Sharing experiences across language translations may help eludicate problems in understanding the standard or finding useful philosophic correlate expressions in different languages.

Do/can we facilitate a place of exchange on these topics?

Means of Approaching (Ontological Translation Methodology)

Are there better or worse methods for approaching the translation task as such? 

E.g.: should one translate classes and properties from E1 to En, P1 to Pn or should one follow the ontological hierarchy?

What are key terms that might best be approached first in order to support the general translation? (E.g.: Space Time Volume?)

Change Management - Version Compare

What is the best way to manage iteration between version and efficient translation? (don’t want to retranslate all if possible)

Place of Publication of Translation and Level of Recognition

Where are official translations published? Are they sufficiently visible? What is their relation to serializations?

Copyright Issues

Under what copyright should translations be made?

Infrastructure to Support Publication / Promotion of Translations

Is there any? Should there be any?

Template for Translators’ Introduction

The translation work in itself is another intellectual work which requires many important choices and requires the introduction of an interpretation of meaning and sense. A translator’s introduction then would be important in order to convey important decisions and methodological choices. Should this be standardized?

The above represents a first set of ideas. I propose we have a general discussion of this question and see if there is interest and capacity in the membership to create such guidelines and protocols.

Current Proposal: 

Posted by Anais Guillem on 25/02/2021

Hi CRM-lovers,

I would like to follow up on George's email about the translation. In October 2019, a group of French archaeologists and CH specialists expressed an interest to translate the latest version and the future version 7 in order to disseminate CIRDOC CRM more easily. Now, the project of translation is international (France, Belgium and Canada) and a collaborative effort. It is mostly inspired by Wiki contributions and everything is done in Gitlab with version control. The group meets (via Zoom) once a month to establish some priorities and discuss the different issues. 

The project is open to anyone interested in contributing to the translation in French: you just need a Huma-Num account.

https://gitlab.huma-num.fr/bdavid/doc-fr-cidoc-crm

The translation files could be used for translations in other languages. The diagrams are also in the process of translation. The translation issues are discussed in the Gitlab issues. The how-to is explained in the Wiki section of the gitlab project. 

It would be very interesting to know if there are currently other translations projects in other languages to compare the process and methodology. The git repository could be cloned if another group wants to translate the ontology in another language. 

Posted by Philippe Michon on 27/02/2021

Dear all,

As this issue arises from a discussion between George and us at the Canadian Heritage Information Network (CHIN), I just wanted to confirm that we are greatly interested in this issue. 

The main reason is that we must have a French version in order to be able to use CIDOC CRM within our organization. Indeed, we have rules on bilingualism that oblige us to have a quality French equivalent (that meets the quality and maintenance standards of governmental agencies) in some strict time limits of the standards to which we refer.

We are contributing to the French translation initiative presented by Anaïs. In addition, for administrative reasons, we are in the process of setting up a specific translation process for the Canadian team.

Of course, we will share with you as soon as possible the documents that we will make publicly available to our editors and partners. Here is a list of what we plan to share in the coming year:

  1. Google Docs translation templates

  2. Protocol to convert Google Doc Templates in Markdown (our goal is to publish on Github Pages)

  3. Stylesheet

  4. Index of CIDOC CRM entities (translated)

  5. Update protocol (e.g. 7.0 to 7.1)

  6. Spreadsheet for keeping track of the typos in the English version

  7. List of the translation challenges

  8. Best practices for translation

We hope that our work will serve as a foundation for the development of general recommendations and protocols in order to further democratize CIDOC CRM.

We look forward to participating in discussions concerning this issue.

Posted by Franco on 27/02/2021

Dear all,

the appearance of this issue is the sign of the vitality, importance and diffusion of the CRM.

Undertaking a transation poses a number of issues that need to be addressed before moving to practicalities.

The “Canadian case” shows the need of complying with legal constraints. For example, if a country formally decides that the national standard for cultural heritage documentation is the CRM, the related decree will need to have an appendix with the CRM version approved, and I think that it would not be acceptable to include it in English, but it should be in that country’s national official language(s). Thus it is better to have an ‘approved' translation in advance, to guarantee that the ‘official’ text is a faithful one. This may also resolve contractual issues, for example with companies contracted to prepare heritage documentation compliant with CRM.

On the other hand, using different translated versions of the CRM may - at least in principle - undermine its universality. Even if machine actionability would eventually be preserved, attention must be paid to the human side of the job, to guarantee that scope notes - for example - give the same meaning to labels acroos translations. 

What should be translated? Of course, the discursive part, as the introduction - the pages numbered with Roman numerals in the CRM description. But, they contain examples and references to Classes and Properties, for which the specific rules should apply. For example, the statement on page xi "In CIDOC CRM such statements of responsibility are expressed though knowledge creation events such as E13 Attribute Assignment and its relevant subclasses.” includes such a reference that must follow the translation rules for Class names. 
Another example is the “IsA” relationship. If translated, it contains the indeterminate article “A” which in some languages must follow the grammatical gender of the term it refers to, and thus gets two/three equivalents. So my choice would be to consider it as a symbol and keep it in English also in the translations. There may be other issues of this kind, so a general directive should be 1) established 2) accepted according to local constraints. I believe that the decision could be easy in this particular case; but it must be decided for all the similar occurrences. 

The above leads me to think that before undertaking any translation, the official English version should be examined to evaluate what is English - and may be translated - and what is symbolic and just seems English - not to be translated. IsA is an example, there may be others. The translation may be funny from a literary point of view (“Martin Doerr IsA un homme”), so an explanation could be given - maybe in a footnote - to help understandability.

Naming conventions (pages xiv - xv) should of course be preserved. Here examples are given in Italic e.g. "E53 Place. P122 borders with: E53 Place”. I am not completely clear with the need of a full stop after Place (could be a typo from copy-paste), but also the use of Italic is introduced surreptitiously. By the way, it is maybe high time to establish a recommendation to standardize how to quote class and property names e.g. in articles, in order to distinguish them from plain discourse also typographically.

Coming to scope notes, I think that only the symbolic parts should remain in English, i.e. the alphanumeric label e.g. “E1”. 

The above are just examples of what a preventive survey of the official English text will define as “not translatable”. In my opinion it wouldn’t take much time to fo it.

The next step is what George calls “translation rules”. I am looking forward to fierce debates about the translation of “Human-made”, if it should follow the style of the Nusée de l’Homme (“fait par l’homme”) or choose a gender-neutral “anthropogenic” or whatever else.

I agree with George on the necessity of general guidelines and protocols to translation. But since these depend on the culture behind the language into which the CRM is going to be translated, accepting them is not automatic: how can a native English (or Greek, or German) speaker decide what is better for Italian or French? So such protocols should be stated in a general form, and then implemented language by language, what brings us back to George’s topic about "What are the criteria for accepting a translation as official?” and who is in charge of it. There may be different levels of “acceptance”, e.g. a working text, a published translation for comments, a technically approved one and a linguistically approved one. I would feel confident enough to address the first three levels, but for the highest level I would need the support of linguists - better if official ones. 

To profit of what is already being undertaken, who decides if the French Canadian version is OK? Is there any potential conflict between what the SIG (or any judge established by it) decides and the decision by an officially established Canadian referee for effective bilingualism? 

Finally, copyright. The copyright statement in the title page of CRM documentation "Copyright © 2003 ICOM/CRM Special Interest Group”  in my opinion sounds a bit old-fashioned and unpleasant, there are nowadays more appropriate licensing schemes that allow public open use, give appropriate recognition to authors, and protect the moral rights of those involved in the work, people and organizations, while avoiding any unauthorized commercial exploitation. In the era of Open Science it sounds a bit conservative. The same should apply to translations.

As you may have understood from this long email, I am interested in the adventure, both in preparing the general framework and in supporting a translation into Italian. If useful, we can advertise the initiative through various networks, to inform those potentially interested in the job.
 

Posted by George 03/03/2021

Dear all,

Thanks already for your valuable feedback and uptake on this proposal. I am pleased to say that this issue has been added to the official CRM SIG issue list:

http://www.cidoc-crm.org/Issue/ID-528-guidelines-and-protocols-for-translating-cidoc-crm

It is also scheduled to be discussed in the afternoon session of the upcoming SIG on Monday March 8th. I do hope everyone responding here and all others interested in this topic will be available to share their knowledge and help us move this subject forward.

Posted by Massoomeh on 8/3/2021

Dear All,
Thank you George for proposing this issue. I totally agree with this proposal. Due to our experience with translating the Model into Persian, Omid Hodjati and I answered to your questions. Please follow this link to see the slides of our answers.

I am looking forward to the discussion tonight!

In the 49th joint meeting of the CIDOC CRM SIG and SO/TC46/SC4/WG9; 42nd FRBR – CIDOC CRM Harmonization meeting, Philippe Michon brought the SIG up to date with the CHIN initiative to translate the CRM into French. The sig decided to put together a WG to discuss translation-related issues (methodology, protocols, tools and software to apply when translating the CRM).

The initiative will be lead by Pilippe Michon (HW: to inform the SIG and set the WG). 

The translation WG should take into consideration the following aspects: 

  1. content guidelines
  2. interoperability standard + versioning tools
    • too many tools 
    • structure units and mark them up etc.
  3. communication and validation protocols 
  4. all teams engaging in CRM translation projects should be mentioned at a visible place on the website (translations page).  
    HW: everyone leading an official translation project to share details 

March 2021

Post by Philippe Michon (18 June 2021)

Dear all,

In anticipation of the SIG meeting, I wanted to inform you of the progress of the work of the CIDOC CRM Translation Guidelines Working Group.

First of all, I would like to remind you that our current mandate is to discuss issues related to translation, in particular questions relating to methodology, protocols, and tools.

The working group is made up of 13 members who represent at least 8 different languages. I would like to take this moment to thank those who have contributed to the reflection during these last two months.

The working group met twice. The first meeting made it possible to present our respective projects in addition to initiating a reflection on the needs that the group should address. The second meeting made it possible to identify more clearly the needs and the documentation that we will have to develop. We have also highlighted certain aspects that go beyond our mandate.

Without going into details in this email, we are considering the creation of 5 potential documents:

  1. "Guide of CIDOC CRM Best Translating Practices" which will define the different levels of translation, the expertise required, the workflows and recommendations on how to properly develop a style guide.
  2. "Governance Guidelines" which will define the licensing options, a translation policy and rules to ensure quality translations.
  3. "Comparison and Update Protocol" which will make it possible to easily compare versions, in particular by explaining how changes will be tracked. This document will also include the mechanisms to ensure the improvement of the original version, in particular by the presentation of a clear communication protocol between the SIG and the translation initiatives.
  4. "Introduction for translators who are new to CIDOC CRM" which will serve as a practical guide for translators who are less familiar with CIDOC CRM. It is presently contemplated to reuse documents which already exist.
  5. "Tools and Interchange Protocols" which will define the technological aspects which will facilitate the exchange of information between the SIG and the translation initiatives. We think in particular the questions of formats, templates, styles, compatibility, updates, bibliography management and tools.

 

As mentioned above, some aspects are outside the scope of our working group and for this reason, we would like to solicit the participation of the SIG with regard to the following aspects:

  1. We believe that it is important to give visibility to translation initiatives on the CIDOC CRM website, in particular to be able to quickly identify current initiatives, but also to easily access the documentation that we are going to produce.
  2. We would also need your insight into governance, particularly in terms of licensing, the ecosystem the initiatives will be part of, and publication formats.
  3. We believe that a comprehensive glossary to cover certain ambiguous terms would be very useful to allow a quality translation.
  4. Finally, in order to facilitate the creation of references, direct access to the SIG bibliography on Zotero would be appreciated.

 

In conclusion, the next few months will be devoted to writing this documentation and we invite those wishing to participate in the initiative to contact us.

Everything will be presented to you in a more detailed fashion during the SIG meeting; here is the visual support that will accompany the presentation if you ever want to consult it in advance.

All the best,
Philippe

In the 50th joint meeting of the CIDOC CRM SIG and SO/TC46/SC4/WG9; 43nd FRBR – CIDOC CRM Harmonization meeting, PM gave an overview of the direction that the work of the translation initiative is leading to (proposed guidelines documents and the topics they should cover) and where they expect that a closer collaboration with the SIG is required). -slides here

Discussion points: 

  • MD reported that the team at ICS-FORTH working on the implementation of the CIDOC CRM website have cut down the official document to pieces (coherent sections of Introduction; Class definitions; Property definitions).
    • Versioning will be made available for each of these sections.
    • Translations are also linked with their respective sections, at least for these CIDOC CRM versions that have them.
    • Breaking the document down to suitable sections will be part of the prototype that FORTH will produce. It will result in a style-free, xml format of the text.
    • Breaking down the text in sections matching the ones used for the original documents could also be implemented in translations. If it works, it could serve as the basis for the interchange protocol with all the translation initiatives.
      • It is practically impossible to keep 8+ translations in sync with the Official (English) document. The translations, where available, should refer to their respective English version, so users will know if translation x corresponds to the official v7.1.1 or a prior one.
      • Links to the relevant text of the respective version should be shown –to be compared with the Official and/or current version of the CRM. 
      • Translations of obsolete versions of the CRM will be reworked to fit the xml partitioned document sections. This way, versioning will also be available for translations (under resources).
  • PR noted that the translation groups that might be interested in translating a particular extension do not necessarily involve the same people working on the translation of CRMbase, for that same language. Given the number of projects that people working on extensions are involved in, the guidelines documents should be kept relatively small or they’ll discourage candidates. 

DECISION: continue with this line of work. 

HW: GB to determine the most suitable place on the website for the translations and the translation guidelines to appear under.

Post by George Bruseker (7 October 2021)

Dear all,

I was charged with HW from issue 528. 

"HW: GB to determine the most suitable place on the website for the translations and the translation guidelines to appear under."

 

I hereby link my proposal for the SIG's consideration.

 

https://docs.google.com/document/d/1BiGrX_pieVCCwlNf-JQweHkwTp57mfY58SIPxBvwkhw/edit

 

Best,

George

Posted by Philippe Michon on 7/10/2021

Next Tuesday, I will have the chance to present to you the progress report of the CIDOC CRM Translation Guidelines Working Group. The presentation will focus on the document called "CIDOC CRM Translation Best Practices Guide". The committee is always interested in new ideas to improve the document so do not hesitate to write to us if you have any questions or comments.

 

@George, I will be happy to give you some time in my session to present your thoughts about the translation section on the website. :)

In the 51st CIDOC CRM & 44th FRBRoo SIG meeting, PM gave an outline of the Translation Guidelines document.  Link to the document here: https://docs.google.com/document/d/1AJ7eC3p5NtDeVdlOlhlrB1PI9sCKhm0PTf3bcWZHSN8/edit?usp=sharing

SIG members are invited to provide feedback on the document drafted by the translation WG by the end of October.

Focal points: propose a governance framework for the translation initiatives and help identify missing points that would help yield quality translations for CIDOC CRM

Post by George Bruseker (10 February 2022)

 

Dear all,

A sub aspect of the issue that Philippe will introduce around translations is 'where to put the translations' on the website. I was tasked with doing this some SIGs ago. The HW has been there for a while. Time permitting, we can look at it!

https://docs.google.com/document/d/1BiGrX_pieVCCwlNf-JQweHkwTp57mfY58SIP...

Best,

George

In the 52nd joint meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9; 45th FRBR - CIDOC CRM Harmonization meeting; PM gave a progress report re the work undertaken by the translation initiative and GB made a proposal re. where the translations should appear on the site. Specifically: 

  1. Progress report by PM. Link to presentation. Points of interest summarized in the following (DRAFT)  documents:

Points of discussion:

  • Novel examples shouldn’t be substituted for the ones already in the CRM without getting the SIG’s agreement first. They form part of the definition. If a translation group finds a particular example underinformative and would like to use another instead, they should bring this issue to the SIG.
  • Regarding the translation order: the classes and properties that appear in the introduction should probably rank higher than the ones that are not mentioned in it. Maybe another column should be implemented that considers this aspect. This has been done for the French translation but it has not been added in the shared documents. 
  • Introduce a “shortcut” procedure, when some part is not clear or cannot be properly translated to raise an issue with the SIG.
  • Regarding ideas/questions to be included in the governance guidelines:
    • a very important issue is how to identify the groups undertaking translation projects and then how to support them and ensure that no duplication of effort is required.
    • Some CRM-SIG members to have a more active participation wrt translation initiatives, identify issues as soon as they occur –has been done before with the Chinese, German and Greek translations
    • MR volunteered to participate in the group’s meetings –she has considerable experience with translating the IFLA standards, could assist in drafting the governance guidelines.

 

  1. Where should translations should appear in the website. Proposals by GB:
  • Remove the translations section from the site altogether. Translations should be listed under whatever version they render in a different language (main Resources page> version number > translation in <whatever> language).
  • Add a subsite for translation initiatives (like we have for members, projects etc.) where information is given on the various groups and the languages they are translating CRM into –also information on contact persons etc.

Discussion points:

  • The “Translations” page needn’t be deleted –there might be incomplete translations of an official version (the translation process can have interim outputs –if one is to follow the hierarchical order proposed for the translation guideline). Partial translations could be listed in that space but also be displayed under resources (and appropriate version).
  1. MD produced a table of translation units for the introduction section of CIDOC CRM v7.1.1 for which he then provided equivalent (or at least comparable) parts in the introduction section of published versions predating it, and flagged all major changes among versions.

Proposal: implement an xml format which identifies translation units independently (through an identifier –section headers have changed so they are not 100% reliable) and then map it to its general super-section (in a hierarchical structure).

Diff between versions of the CIDOC CRM by Etz: https://cidoc-crm.org/html-dev/comparisons/

  • Terminology should be broken down to terms –each term to become a translation unit.
  • What sections have a continuing identity (f.i. Monotonicity) and can be traced throughout versions.

Overall Decisions:

  • MR to be included in the Governance Guidelines discussion.
  • completed translations of official versions appear in the resources section under their respective version
  • HW: ETz and the team at FORTH to come up with a proposal re how unfinished versions of translations appear on the site  
  • HW: CB and the team at FORTH to come up with a proposal re the representation of official translation groups on the CRM site (see Issue 596 and implementation of the new link Activity documentation)
  • HW: ETz & MD to come up with a proposal for describing the parent-headers of independent translation units (to be applied to Terminology as well)

 

February 2022