PDF Assistant for Trados

PDF Assistant for Trados

By Trados AppStore Team

Free

Description

The application is designed to support the conversion of PDF files into a DOCX so that you can improve the quality of the DOCX prior to translating it in Trados Studio. The reason we have taken this approach of convert prior to translation is because PDF to DOCX conversion without professional editing software can sometimes cause formatting issues, resulting in a document that looks different from the original PDF. If you convert first you can correct these issues before translating which leads to a better user experience overall.

It's important to note that the quality of the conversion largely depends on the quality of the original PDF and the conversion software used. Some conversion tools may produce better results than others. This "Add-In" initially makes use of the Microsoft Word desktop API providing simple text conversion and also some OCR capabilities. Whilst you could simply use Word and avoid the "Add-In" altogether it's worth noting that the plugin does provide more support than Microsoft makes available through Microsoft Word, in particular around OCR capability.


To learn how to use this application, please visit PDF Assistant for Trados in the RWS Community wiki

Technical details

2.0.1.0 - Trados Studio 2024

Changelog:

  • support for Studio 2024
  • TellMe features

Checksum: 646acf24d039d6c9ddfc9d559845d624f1c0f80f001628844a585351fde61219

Release date: 2024-06-25

1.1.1.0 - Trados Studio 2022 (SR1)

Changelog:

  • Addressed breaking changes 
  • Sdl.Core.Globalization (Modified)
  • Sdl.Core.Globalization.Async (Added)
  • Sdl.Desktop.Platform.Controls.Behaviors.MouseDownBehavior (Deleted)


Checksum: 24b4381411a532eaf98600ff6c24447ce1e3cf54236aac1559231ec99df590e1

Release date: 2023-06-20

1.0.1.1 - Trados Studio 2022

Changelog:

  • Corrected updated plugin manifest to ensure that the plugin will not attempt to install into Trados Studio 2022 SR1. This is important because the SR1 release contains breaking changes that will cause this version of the plugin to prevent Studio from starting. There will be a further update of this plugin specifically for the 2022 SR1 release either alongside, or shortly after SR1 is made publicly available.


Checksum: a7cdbb57d99f1fc083cbdc435afe7b004f07f3be7109d5a832343a69c03cac12

Release date: 2023-06-06

2.0.1.0 - Trados Studio 2024

  • support for Studio 2024
  • TellMe features

1.1.1.0 - Trados Studio 2022 (SR1)

  • Addressed breaking changes 
  • Sdl.Core.Globalization (Modified)
  • Sdl.Core.Globalization.Async (Added)
  • Sdl.Desktop.Platform.Controls.Behaviors.MouseDownBehavior (Deleted)



1.0.1.1 - Trados Studio 2022

  • Corrected updated plugin manifest to ensure that the plugin will not attempt to install into Trados Studio 2022 SR1. This is important because the SR1 release contains breaking changes that will cause this version of the plugin to prevent Studio from starting. There will be a further update of this plugin specifically for the 2022 SR1 release either alongside, or shortly after SR1 is made publicly available.



1.0.0.13 - Trados Studio 2022

  • Implement OpenXml to optimize how we work with Drawings and Pictures in the word document. 
  • Display progress when loading and processing images in the word document.
  • Marshal release of all interop.word references
  • Display images fully and uniformly depending on the width of the image column



The PDF Assistant for Trados is an Add-In for Trados Studio that supports the conversion of a PDF to a DOCX so it can be successfully translated and delivered as a DOCX target file.

Installation

The application is an sdlplugin and can be installed either by visiting the RWS AppStore, downloading, and then manually installing by double clicking the sdlplugin file in the usual way. Alternatively the plugin can be installed through the Integrated AppStore in Trados Studio. For this to work you must have Microsoft Office installed. The testing was carried out on computers using Office 365 and not on older versions.

It's important to note that whilst this tool can do a decent job of converting most PDF files to DOCX it is not a process that is guaranteed to work 100% of the time. Working with PDF files can be a tricky business and it's recommended you review the section on "Working with PDFs" to understand a little where limitations can occur. If you cannot handle your PDF with this plugin you may require a more sophisticated tool such as Abbyy FineReader or Adobe Acrobat Pro which are designed specifically for working with this format.

Where is it installed?

The plugin is installed into the ribbon in the "Add-Ins" tab and into the "Toolbox" group:

Working with PDFs

The application is designed to support the conversion of PDF files into a DOCX so that you can improve the quality of the DOCX prior to translating it in Trados Studio. The reason we have taken this approach is because PDF to DOCX conversion without professional editing software can sometimes cause formatting issues, resulting in a document that looks different from the original PDF.

The more common problems that can occur during PDF to DOCX conversion would be things like:

  1. Text and image placement: Sometimes, the text and image placement can become distorted during conversion, causing the final document to look different from the original PDF.
  2. Formatting issues: PDFs often have complex formatting, such as columns, tables, and graphs. These elements can be difficult to convert to DOCX, leading to formatting issues in the final document.
  3. Fonts: If the PDF contains fonts that are not installed on the computer doing the conversion, the text can appear differently in the final document.
  4. Large files: PDF files can be very large, and converting them to DOCX can result in large files that take up a lot of storage space.
  5. Security features: Some PDFs have security features that prevent copying and pasting, which can make it difficult to convert the document to DOCX.
  6. OCR issues: If the PDF contains scanned images or text that was not originally digital, OCR (optical character recognition) software is needed to convert the text. However, OCR can sometimes produce errors or miss characters, leading to mistakes in the final document.
  7. Unnecessary Tags: any of the above problems can lead to many unnecessary control tags being inserted into the DOCX that will become visible when working with a translation tool.
  8. Poor Segmentation: similarly any of the above issues can lead to unnecessary hard returns being added into the DOCX and these will also make translation more difficult than is necessary.
  9. Incorrect character display: If the character encoding is incorrect, it can cause characters to be displayed incorrectly in the final document. For example, some characters may appear as question marks or boxes especially with Asian character sets.
  10. Missing characters: In some cases, incorrect encoding can cause certain characters to be missing from the final document. This can result in text that is difficult to read or understand.
  11. Encoding conflicts: If different parts of the document are encoded in different ways, it can cause conflicts and errors during conversion. For example, some characters may be encoded in UTF-8 while others are encoded in ASCII, leading to errors when the document is converted to a PDF or other format.

It's important to note that the quality of the conversion largely depends on the quality of the original PDF and the conversion software used. Some conversion tools may produce better results than others. This "Add-In" initially makes use of the Microsoft Word desktop API providing simple text conversion and also some OCR capabilities. Whilst you could simply use Word and avoid the "Add-In" altogether it's worth noting that the plugin does provide more support than Microsoft makes available through Microsoft Word, in particular around OCR capability.

Using the "Add-in"

Adding your files

The PDF Assistant for Trados is started by clicking on the icon in the ribbon. This opens up a small wizard where you can add your files:

You can add as many files as you like, in as many languages as you like, but keep in mind the process could take a considerable amount of time and may even run out of memory if you ask for too much. How many files you can use really depends on the number of pages, number of images in the file, amount of OCR work required etc. Think about the work you are about to carry out and don't expect miracles!

The files or folders can be added via drag and drop, or by using the small icons in the wizard. In this example two PDF files have been added. An English language text containing two images, one that needs to be OCR'd and one that does not; and a Korean document that is non-readable, so the entire content is one big image in the PDF.

Selecting your Provider and OCR options

This screen allows you to do several things:

  1. select the PDF Assistant you wish to use. For now there is only Microsoft Word to select from.check the option to specify whether or not you wish to extract text from the images and if so (in the next screens) which ones you would like to be processed (OCR'd)
  2. keep in mid that when you OCR the images you will lose any background image that was there and will only have the text that the software was able to extract

You can cancel the process at any time if the file is too complex for the application to manage.

Image Selection

This part of the wizard will extract the images the software was able to identify and allow you to specify which of the images contain translatable text.

Summary Stage

This screen in this stage of the wizard displays a summary of the options you have chosen for the conversion.

Preparation

The final stage provides an indication of the progress until the conversion has completed:

DTP the converted files before Translation

Now you can open your converted PDF files as a DOCX in Microsoft Word and improve the quality of the file before you translate it. This way the target file will probably be ready to go, or at least require minimal editing to accommodate changes required as a result of text expansion/contraction in the target language.

A good tool for tidying up files resulting from a messy PDF conversion is TransTools available here - https://www.translatortools.net/products/transtools

In the example files, the English file contained two images, one that was OCR'd and the other treated as an image. The result isn't bad (PDF on the left, converted DOCX on the right) and if you were to open this PDF file in Microsoft Word both images would be handled as images, so the "Add-In" does provide considerable value here. The table needs tidying up but it is editable and could save time when more extensive text is involved:

On the Korean non-readable PDF. Some formatting would be required, but it's not too bad. The image is floating and can be positioned wherever I like, and all the text is available to me for translation. So some small amount of DTP work and I'll have a file that is easily translatable and the target file should be good with minimum work required:

Checkout other plugins from this developer:
Studio Subtitling

Free

Studio Subtitling

By Trados AppStore Team

The Problem this app addressesThe volume of audio visual content for localization is growing rapidly. Turnaround times are getting shorter and many of those working in the industry are feeling increased price pressures in dealing with this sort of content. Translation tools today lack proper context for subtitlers, offering poor support for the variety of file formats.The Solution this app providesThe Studio Subtitling plugin supports enhanced features for audio visual translation, editing, proofing and works synchronously with the Studio editor in support of the following filetypes:- ASS (available here on the RWS AppStore)- SRT (supported out of the box in Trados Studio 2021/2022)- webVTT (supported out of the box in Trados Studio 2021/2022)- STL (available here on the RWS AppStore)- SBV (supported out of the box in Trados Studio 2021/2022)- TTML (supported out of the box in Trados Studio 2021/2022)NOTE:The plugin does not work with single document projects, resulting in errors. To use it properly, you need to create standard projects.It is not possible to use the numpad for non-subtitling files if this plugin is installed. This is a limitation of Studio. The workaround is to remove the keyboard shortcuts for numpad in the Studio options. They can be reset when needed by using "Reset to Defaults" in the Subtitling Keyboard shortcut list.Click here to download the TQA model for Trados Studio that is also supported by this plugin.