CyTerm iteration plan

Author: Chris Turner,


Phase 1

A web site will be created.
The vision, business case and other management documents (including this one) will be published to attract potential clients. On-line registration forms will allow clients to join the project. Cycom is registered under the Data Protection Act to hold this client data. A simple downloadable personal termbank application (written in java 1.2) will be the first offering. Clients will be able to download app but will also need jre1.2. Their subscription will be debited from an account, no need for money to change hands on the understanding that they must contribute terms for sale and the terms will pass to public domain if their account is not settled.
Clients paying in advance may request a CDROM with jre1.2 + the app.
Termbanks must be backed up (encrypted) at Cycom with an identity and password. Cycom will not be able to access the terms without the identity and password.
There is no sharing at this stage so management info can be omitted.
Term data is author, subject, conceptid, terms, languages, definitions, context, explanation, sources, project subset, usage, grammar, term type
Software will have an "update software", "backup to cycom", "restore from cycom" menu item.
Search is free text. Software language is english. Import/Export format is subset (maybe TMX) of ISO draft standard X-MARTIF XML and html. Input languages are ISO 8859-1 only.

Phase 2

Relations,  and all other data categories from the CLS framework will be supported by the termbank software. Import/Export will be full blind MARTIF XML. Input from many more character sets are supported but probably not the hard ones (e.g chinese, arabic). Software is localised to several western locales.
Termbanks will be stored and advertised on-line and will be downloadable via an identity and password. The number of terms, subject area, author name, credentials, languages and bank/term subscription price (in cytokens) for each bank will be published. All members will be granted an overdraft facility of 30 cytokens. A cytoken does not have a defined monetary value at present but should be estimated at 1 Euro. Groups of termbanks may be identified by an identity and password and may be searched on-line for a particular term component.
An on-line ordering system will be provided with clients virtual cytoken accounts being credited and debited by Cycom. A open market for cytokens will be established where members can exchange cytokens for cash if needed.

Phase 3

The terms in individual author's termbanks will be exported to a central termbank, validated and merged with terms from other authors for the same concept. A single concept entry may now contain terms for many languages contributed by many authors. The software now supports access control and versioning at the term level of granularity. Terms can be updated by many authors without loss of data. Read-only extracts from the central termbank can be produced for particular subjects and languages and authors but these are no longer master copies. There is enough audit information to permit the licence fees paid by a user to be shared between term contributors. The software supports chinese, arabic, japanese and other difficult input methods. The software is localised in these languages as well. Export/import formats include the LISA TBX standard. Web access to terms is now possible on a pay per term basis and may be open to non-members. Fuzzy searching is supported.

Phase 4

A translation memory software is released, linking into the termbank. Import/Export in TMX format. Interface to word processor by cut and paste.

Phase 5

Document parsing/tagging software can recognise some document markup formats (eg rtf, html) and has some knowledge of grammar (e.g english grammar). Terms and translation units can be identified automatically. This can be used to research terms, populate translation memories, and perform rough machine translation of some documents. The translation memory interface to word processors will be more streamlined.

Phase 6

Translation memories can be marked up to show grammar and can be linked to grammar rules. Machine translation gets better and translation memory matching becomes more intelligent.

Phase 7

Software agents can scan source documents and automatically update or schedule parts for retranslation of target documents. Document workflows are automated. Distributed editing of documents is supported with locking, mirroring and auditing to ensure no loss of data.

Members skills are advertised and a multi-party consortium can be quickly assembled via on-line blackboards to bid for specific projects. A virtual exchange for document projects will be created.


The iterations focus on terminology collection as a basic enabling technology and build this up from the word to the sentence and finally document level of collection. The iterations start by ensuring shared meanings and understanding and end by permiting real time collaboration in a document engineering project. The economic needs of the actors are recognised at each iteration and it is hoped that members will experience continuing benefit from continuing membership.


The deliverables and completion dates are subject to revision in the light of experience gained during the project and feedback from members.