BIK Terminology—

Solving the terminology puzzle, one posting at a time

  • Author

    Barbara Inge Karsch - Terminology Consulting and Training

  • Images

    Bear cub by Reiner Karsch

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 790 other subscribers

Archive for the ‘Designing a terminology database’ Category

Terminology extraction with memoQ 5.0 RC

Posted by Barbara Inge Karsch on August 15, 2011

In the framework of a TermNet study, I have been researching and gathering data about terminology management systems (TMS). We will not focus on term extraction tools (TE), but since one of our tools candidates recently released a new term extraction module, I wanted to check it out. Here is what I learned from giving the TE functionality of memoQ 5.0 release candidate a good run.

Let me start by saying that this test made me realize again how much I enjoy working with terminological data; I love analyzing terms and concept, researching meaning and compiling data in entries; to me it is a very creative process. Note furthermore that I am not an expert in term extraction tools: I was a serious power-user of several proprietary term extraction tools at JDE and Microsoft; I haven’t worked with the Trados solution since 2003; and I have only played with a few other methods (e.g. Word/Excel and SynchroTerm). So, my view of the market at the moment is by no means a comprehensive one. It is, however, one of a user who has done some serious term mining work. One of the biggest projects I ever did was Axapta 4.0 specs. It took us several days to even just load all documents on a server directory; it took the engine at least a night to “spit out” 14,000 term candidates; and it took me an exhausting week to nail down 500 designators worth working with.

As a mere user, as opposed to a computational linguist, I am not primarily interested in the performance of the extraction engine (I actually think the topic is a bit overrated); I like that in memoQ I can set the minimum/maximum word lengths, the minimum frequency, and the inclusion/exclusion of words with numbers (the home-grown solutions had predefined settings for all of this). But beyond the rough selection, I can deal with either too many or too few suggestions, if the tool allows me to quickly add or delete what I deem the appropriate form. There will always be noise and lots of it. I would rather have the developer focus on the usability of the interface than “waste” time on tweaking algorithms a tiny bit more.Microsoft PowerPoint Clip Art

So, along the lines of the previous posting on UX design, my requirements on a TE tool are that it allows me to

  • Process term candidates (go/no-go decision) extremely fast and
  • Move data into the TMS smoothly and flawlessly.

memoQ by Kilgray Translation Technologies* meets the first requirement very nicely. My (monolingual) test project was the PowerPoint presentations of the ECQA Certified Terminology Manager, which I had gone through in detail the previous week and which contained 28,979 English words. Because the subject matter is utterly familiar to me, there was no question as to what should make the cut and what shouldn’t. I loved that I could “race” through the list and go yay or nay; that I could merge obvious synonyms; and that I could modify term candidates to reflect their canonical form. Because the contexts for each candidate are all visible, I could have even checked the meaning in context quickly if I had needed to.

I also appreciated that there is already a stop word list in place. It was very easy to add to it, although here comes one suggestion: It would be great to have the term candidate automatically inserted in the stop-word dialog. Right now, I still have to type it in. It would safe time if it was prefilled. Since the stop word list is not very extensive (e.g. even words like “doesn’t” are missing in the English list), it’ll take everyone considerable time to build up a list, which in its core will not vary substantially from user to user. But that may be too much to ask for a first release.

As for my second requirement, memoQ term extraction doesn’t meet that (yet) (note that I only tested the transfer of data to memoQ, but not to qTerm). I know it is asking for a lot to have a workflow from cleaned-up term candidate list to terminological entry in a TMS. Here are two suggestions that would make a difference to users:

  • Provide a way to move context from the source document, incl. context source, into the new terminological entry.
  • Merging terms into one entry because they are synonyms is great. But they need to show up as synonyms when imported into the term base; none of my short forms (e.g. POS, TMS) showed up in the entry for the long forms (e.g. part of speech, terminology management systems) when I moved them into the memoQ term base.

imageMy main overall wish is that we integrate TE with authoring and translation in a way that allows companies and LSPs, writers and translators to have an efficient workflow. It is imperative in technical communication/translation to document terms and concepts. When this task is put on the translators, it is already quite late, but it is better than if it doesn’t happen. Only fast and flawless processing will allow one-person or multi-person enterprises, for that matter, to carry out terminology work as part of the content supply chain. When the “fast and flawless” prerequisite is met, even those of my translator-friends who detest the term “content supply chain” will have enough time to enjoy themselves with the more creative aspects of their profession. Then, economic requirements essential on the macro level are met, and the need of the individual to get satisfaction out of the task is fulfilled on the micro level. The TE functionality of memoQ 5.0 RC excels in design and, in my opinion, is ready for translators’ use. If you have any comments, if you agree or disagree with me, I’d love to hear it.

*Kilgray is a client of BIK Terminology.

Posted in Designing a terminology database, memoQ, Producing quantity, Selecting terms, Term extraction tool, Usability | Tagged: | 3 Comments »

HCI International 2011

Posted by Barbara Inge Karsch on August 11, 2011

In July, I spent two days at Human Computer Interaction International 2011 in Orlando, Florida, with hundreds of UX designers, usability analysts, engineers and researchers from around the world. It surprised me that language as part of usability was mentioned just a few times. Furthermore, I didn’t expect to hear so much about the struggle of usability professionals within company hierarchies and cultures. It also occurred to me that many terminology management systems (TMS) may not have taken usability all that seriously so far.

Thunderstorm over Disney WorldChallenged by a missed flight and an extra night in DC, I managed to attend about 40 presentations. None of them even mentioned language, let alone terminology as a focus point or issue. Although Helmut Windl from Continental Automotive GmbH had a wonderful series of translation errors as an intro to his paper on Empathy as a Key Factor for Successful Intercultural HCI Design. Linguistic faux pas are always good for a laugh. As you might expect, my own paper, Terminology Precision—A Key Factor in Product Usability and Safety, was focused on avoiding such faux pas, particularly in the life sciences where blunders could be less than funny.

What came across in more than one presentation is that UX professionals, like language professionals, struggle with their status in an enterprise. Clemens Lutsch from Microsoft Deutschland GmbH gave a good presentation on making the case for usability standards to management that had useful ideas for us terminologists as well, e.g., what he called “the trap of the cost is already there”. What he means with this is that existing roles already take care of the task, say, user-centered design or, for us, something like term formation, so why bother changing anything. The awareness that these employees may not have the right skill set does not (always) exist. Usability folks and terminologists can form alliances on more than one front.

Usability Standards across the Development Lifecycle by Theofanos and StantonLutsch’s was part of a whole session on ISO usability standards and enterprise software. The award winning paper of this track (Design, User Experience, and Usability) by Theofanos and Stanton of the National Institute of Standards and Technology (US) introduced a comprehensive overview of all the standards provided or proposed by the respective ISO technical committee(s) and IEC. The graphic on the left which stems from the paper has lots of detail. But the main point of showing it here is that it has the user at the center and that any and all design tasks revolve around user needs.

I have participated in software development for terminology management systems (as well as in others) and this view was never the prevailing one. The result was often that TMS users struggled with the software: They would rather work in Excel and then import the data than work in the interface that was to support and facilitate their work.

So, here is a challenge to the designers and developers of TMS: Don’t provide systems that do a wonderful job hosting data; provide systems that allow us to do terminology work efficiently and reliably. In Quantity AND Quality, I discussed a few of the easy things that can be done on the interface level. I would love to see tools being developed following not only the soon to be released ISO 26162, but also the usability standards put forth by ISO TC 159, (Ergonomics). By the same token, let the usability and ergonomics people in the committee inspire the rest of their industry. After all their scope includes “standardization in the field of ergonomics, including terminology, methodology, and human factors data.”

Posted in Designing a terminology database, Events, Usability | Tagged: , , , , | Leave a Comment »

Quantity AND Quality

Posted by Barbara Inge Karsch on September 16, 2010

In If quantity matters, what about quality? I promised to shed some light on how to achieve quantity without skimping on quality. In knowledge management, it boils down to solid processes supported by reliable and appropriate tools and executed by skilled people. Let me drill down on some aspects of setting up processes and tools to support quantity and quality.

If you cannot afford to build up an encyclopedia for your company (and who can?), select metadata carefully. The number and types of data categories (DCs), as discussed in The Year of Standards, can make a big difference. That is not to say use less. Use the right ones for your environment.

Along those lines, hide data categories or values where they don’t make sense. For example, don’t display Grammatical Gender when Language=English; invariably a terminologist will accidentally select a gender, and if only a few users wonder why that is or note the error, but can’t find a way to alert you to it, too much time is wasted. Similarly, hide Grammatical Number, when the Part of Speech=Verb, and so on.

Plan dependent data, such as product and version, carefully. For example, if versions for all your products are numbered the same way (e.g. 1, 2, 3,..), it might be easiest to have two related tables. If most of your versions have very different version names, you could have one table that lists product and version together (e.g. Windows 95, Windows 2000, Windows XP, …); it makes information retrievable slightly simpler especially for non-expert users. Or maybe you cannot afford or don’t need to manage down to the version level because you are in a highly dynamic environment.Anton by Lee Dennis

Enforce mandatory data when a terminologist releases (approves or fails) an entry. If you  decided that five out of your ten DCs are mandatory, let the tool help terminologists by not letting them get away with a shortcut or an oversight.

It is obviously not an easy task to anticipate what you need in your environment. But well-designed tools and processes support high quality AND quantity and therefore boost your return on investment.

On a personal note, Anton is exhausted with anticipation of our big upcoming event: He will be the ring bearer in our wedding this weekend.

Posted in Advanced terminology topics, Designing a terminology database, Producing quality, Producing quantity, Return on investment, Setting up entries, Terminologist, Tool | Tagged: , , | 1 Comment »

The Year of Standards

Posted by Barbara Inge Karsch on July 16, 2010

LISA The Localization Industry Standards Association (LISA) reminded us in their recent Globalization Insider that they had declared 2010 the ‘Year of Standards.’ It resonates with me because socializing standards was one of the objectives that I set for this blog. Standards and standardization are the essence of terminology management, and yet practitioners either don’t know of standards, don’t have time to read them, or think they can do without them. In the following weeks, as the ISO Technical Committee 37 ("Terminology and other language and content resources") is gearing up for the annual meeting in Dublin, I’d like to focus on standards. Let’s start with ISO 12620.

ISO 12620:1999 (Computer applications in terminology—Data categories—Part 2: Data category registry) provides standardized data categories (DCs) for terminology databases; a data category is the name of the database field, as it were, its definition, and its ID. Did everyone notice that terminology can now be downloaded from the Microsoft Language Portal? One of the reasons why you can download the terminology today and use it in your own terminology database is ISO 12620. The availability of such a tremendous asset is a major argument in favor of standards.

I remember when my manager at J.D. Edwards slapped 12620 on the table and we started the selection process for TDB. It can be quite overwhelming. But I turned into a big fan of 12620 very quickly: It allowed us to design a database that met our needs at J.D. Edwards.

When I joined Microsoft in 2004, my colleagues had already selected data categories for a MultiTerm database. Since I was familiar with 12620, it did not take much time to be at home in the new database. We reviewed and simplified the DCs over the years, because certain data categories chosen initially were not used often enough to warrant their existence. One example is ‘animacy,’ which is defined in 12620 as “[t]he characteristic of a word indicating that in a given discourse community, its referent is considered to be alive or to possess a quality of volition or consciousness”…most of the things documented in Term Studio are dead and have no will or consciousness. But we could simply remove ‘animacy’, while it would have been difficult or costly to integrate a new data category late in the game. If you are designing a terminology database, err on the side of being more comprehensive. Because we relied on 12620, it was easy when earlier in 2010 we prepared for making data exportable into a TBX format (ISO 30042). The alignment was already there, and communication with the vendor, an expert in TBX, was easy.

ISO 12620:1999 has since been retired and was succeeded by ISO 12620:2009, which “provides guidelines […] forISOcat creating, selecting and maintaining data categories, as well as an interchange format for representing them.” The data categories themselves were moved into the ISOcat “Data Category Registry” open to use by anyone.

ISO 12620 or now the Data Category Registry allows terminology database designers to apply tried and true standards rather than reinventing the wheel. As all standards, they enable quick adoption by those familiar with them and they enable data sharing (e.g. in large term banks, such as the EuroTermBank). If you are not familiar with standards, read A Standards Primer written by Christine Bucher for LISA. It is a fantastic overview that helps navigate the standardization maze.

Posted in Advanced terminology topics, Designing a terminology database, EuroTermBank, J.D. Edwards TDB, Microsoft Language Portal, Microsoft Terminology Studio, Terminologist | Tagged: , , , | 1 Comment »