Vision of multilingual document engineering
Author: Chris
Turner, www.cycom.co.uk
Overview
The multilingual documents are the result of a team of people collaborating,
each contributing a particular skill or expertise. The collaborators are
sharing common documents and terminology termbanks and most can read and
write to these documents and termbanks. Good communications and the CyTerm
computer software support tools ensure that read and write access to the
documents is coordinated to minimise conflicts and detect inconsistencies.
The software provides automatic notification of changes in the documents
so that new workflows can be initiated with minimal delay. The termbanks
provide a resource which ensures that language specific terms are mapped
to the correct concepts unambiguously and consistently.
The team of collaborators includes individuals and members from different
companies and also includes the monolingual source language document authors
as well as the monolingual target language document adapters.
Documents change with time and the CyTerm software tools ensure that
small changes to the source documents result in proportionately small changes
to the target derived or dependant documents, automating the implementation
of derived document changes as much as possible. The software system has
a memory so that information entered once does not need to be reentered.
The actors, processes and supporting resources will now be described
in an approximately linear flow but the real system will be iterative and
some processes will be performed concurrently. The actors are described
by their responsibilities and roles. There is not necessarily one person
for each role and sometimes one person may fulfil several roles.
Concepts, Terms, and Mark-up
Terms are words in a particular language that are used to express a concept.
The concept itself is ideally universal and independent of language. The
Concept is an artificial device that is used to group several language
dependent Terms that all express the same concept. It is convenient to
use this Concept device since it provides a place to store information
(e.g. a picture, or a link to a related concept) that is common to all
terms that express the concept and it provides a pathway to navigate from
a term in one language to a term in another language. Mark-up is annotation
that has been added to a part of the text which provides extra clarification
of the meaning of the main text. This mark-up is normally invisible to
expert audiences that do not require to see it but it can be made visible
to less expert audiences such as non-specialists and computer software
who would have difficulty decoding the text without this supplementary
mark-up..
The Original Source Author has a critically important role to play in the
document engineering process. The source author is the highest authority
on the subject message that he intends to convey to the readers and of
what concepts he intends to convey when he uses a particular term, phrase,
or sentence structure. His understanding of what is meant by a term or
sentence might not be shared by others. When composing the source document
he should ensure that every term and sentence that he uses is unambiguously
mapped to a concept or concepts stored in a terminology databank. If he
finds a term is not in the termbank he shall provide an entry and a definition.
If he finds that a term maps to several different concepts in the termbank
he shall mark-up his document to mark which concept he intends. If he finds
that a sentence can be interpreted ambiguously he shall rewrite the sentence
resolving the ambiguity or provide a mark-up resolving the ambiguity.
If the Original Source Author cannot be persuaded to fulfil all these
tasks, then a proxy for the original source author must be found. The Author
Proxy will be a subject specialist and be monolingual. He will be capable
of unambiguously decoding all the writing of the Original Source Author.
The Target Subject Specialist knows the terms used for his specialist subject
in his native (target) language. He is substantially a monolingual but
is able to read sufficiently well in a foreign source language to be able
to recognise a concept when it has been verbosely defined in that foreign
source language. Note that he does not need to recognise specialist terms
in the source language, nor does he need to be able to decode complex specialist
language constructs in the source language. His role is to contribute terms
and definitions of the term to the termbank for his specialism and native
language. He may work reactively by being notified of new concepts that
have been added to the termbank for which there is no term that has yet
been entered in his language.
The Translator is bilingual and can read and understand texts in the source
language and can accurately express the text in the target (native) language
for a range of specialist subjects. Note that the Translator does not need
to disambiguate complex language constructs since the Original Source Author
(or his Author Proxy) has already done that by marking-up the document.
Neither does the Translator need to know the meaning of all terms in the
source document since the meaning can be looked up in the termbank. Neither
does the Translator need to know the specialist target language term since
that can also be looked up in the termbank (it was placed there by the
Target Subject Specialist).
He is a monolingual and knows the needs and abilities of a target audience
for a document. His role is to adapt the message of the author to suit
a particular audience and purpose that could not be fully considered by
the author. He only needs to be monolingual since an accurate message of
the author has been provided by the translator.
Changes to Documents
Where changes are made to a document, then the entire process involving
all collaborators can be repeated treating only the changed part of the
text plus sufficient locating context as input to the process. Software
tools and the information provided by mark-up provide sufficient means
for locating and managing the edits required in all documents.
Why this vision is likely to become reality
In the current translation market, an individual translator takes on all
roles to a greater or lesser extent. This multiplication of ability to
perform a role automatically means that it is less likely that a customer
can find a free individual translator who can satisfy all the roles. An
example might be an icelandic to chinese specialist in hot metal galvanising
machines. If such a specialist did exist he might not be able find enough
regular work to support himself and would probably be doing something else.
In contrast there is a much better chance of finding the abilities in different
persons. The reason that it has not happened on a large scale to the present
time is because the communication overhead between team members is not
efficiently supported and there is no economic framework which allows all
collaborators to benefit. I believe that communications and document mark-up
technology and electronic trading technology is now sufficiently advanced
to permit such collaborative work to be practically efficient. This efficiency
will reduce the costs of the suppliers. The separation of skills will also
increase the capacity of the suppliers. This gives such suppliers a competive
advantage which will lead to increasing market share.
Missing components
The following components are poorly implemented at present. The CyTerm
project aims to provide good implementations.
-
Mechanisms for term contributors to be rewarded by the term users
-
Mark-up editors
-
Mechanisms for creating/importing/exporting terms to/from termbanks.
-
Version control and document differencing/merging.
-
Electronic commerce for small transactions and a virtual commercial exchange
room.
-
Automatic change notification events which can trigger workflows.
Deliverables and funding
The CyTerm project will deliver software tools implementing the missing
components listed above. Funding will be by forming a club and requiring
a member subscription.
There will also be an on-line service for coordinating software and
termbank updates and also providing a trading mechanism for member to member
trade. This will be funded and managed by Cycom Limited.
The deliverables will be determined in consultation with the members
but the infrastructure components already identified include:-
-
Member database including qualifications for trust building purposes.
-
Termbank (following CLS framework with Blind Martif and TBX import/exports)
-
TMS including review mechanisms.
-
Translation memory (with TMX import/exports)
-
Termbank proxy to other on-line termbanks.
-
Pay-per-use accounting for alternative term trading models.
-
Interfaces to word processors and document formats.
-
Term research tools including web search and term discovery from docs.
Return to Cyterm project