Adopting the right metrics and methods to set up a fair compensation model is one of the current challenges in the translation and localization, as we will see in chapter 8. Whether such compensation model should be based in post-editing effort and up to what extent post-editing effort correlates with the quality measurements of MT output are some of the main questions that the MTPE Training GALA SIG tried to address with the help of academic research.
According to Sergi Àlvarez Vidal, adjunct professor at the Universitat Oberta de Catalunya and PhD researcher at the Universitat Pompeu Fabra, who was the SIG’s guest in one session devoted to post-editing effort, academic research studies post-editing effort in terms of three interrelated dimensions established by Krings (2001): temporal, technical and cognitive effort: temporal effort measures the time spent post-editing the MT output, technical effort refers to the changes applied by the translator (mainly insertions and deletions) and is measured with edit distance or keystroke analysis, and cognitive effort relates to the cognitive processes which take place during post-editing and is measured by eye-tracking or think-aloud protocols.
While Krings (2001) claimed that post-editing effort could be determined as a combination of all three dimensions, none of the currently used metrics accounts for them all.
LSPs and clients usually measure post-editing effort a posteriori, which means, after the post-editing has been completed, mostly with metrics based on Levenshtein (i.e. the amount of changes done during post-editing to the raw MT output). Such metrics are data-driven and, being integrated into more and more CAT tools, are now widely adopted and known by many companies, LSPs and post-editors. Additionally, according to the survey conducted during this session, most of the current university programs include some form of training about metrics to measure the post-editing effort.
However, metrics which measure technical effort such as edit distance do not take cognitive effort into account: even if the text is not modified, or if there are little changes, the cognitive effort is there but not captured at all. This is particularly true with NMT, because the increase in quality makes it harder (i.e. requires more time) to identify errors and to make sure that the output is correct.
The temporal effort or time spent on post-editing is, according to academia, the most reliable measurement method, although it is not perfect either (it can vary amongst translators). Firstly, research shows that there is a correlation between the quality of the MT output and the post-editing time. Secondly, and perhaps most importantly, post-editing time reflects not only the technical effort needed to perform the editing, but also the cognitive effort required to detect errors and plan the necessary corrections. Thus, measuring post-editing time could be the most cost effective and straightforward way of quantifying at least some of the cognitive effort involved in post-editing. The limitation is that time is difficult to track, and it is not possible to do so in all CAT tools yet (see chapter 8 for more insights on this challenge).
Leaving cognitive effort out of the equation has its risks. It could potentially contribute to devaluating the perception of the job of the post-editors: if the effort is reduced to a mere value or percentage of the changes done, it could be assumed that zero changes equals to zero effort. Unfortunately for now it seems that cognitive effort is only measured in academic contexts (usually through pauses).
Cognitive effort is, from the three dimensions, the one that most affects the quality of the work of the post-editor: in a setting where it would be more easy to translate from scratch (the post-editor would simply read the source and produce a target), MTPE requires an additional effort because the post-editor is forced to also look at the proposed MT to verify whether it has been translated according to the source.
This additional effort required causes a psychological effort as well, not only because translators are forced into a certain way of work but also because they need to break the fear barriers related to MTPE. Additionally, the fact that post-editors are forced to use the raw MT output ‘as much as possible’ is causing a strain.
Most academic research is aligned in concluding that a multidimensional approach (i.e. accounting for as many effort indicators as possible) is the best way to capture the quality of the MT output, thus, the effort involved in post-editing.