The vortex caused by the implementation of the GDPR legislation has a little bit subsided. But even if you have organized everything well, it remains an agenda item for all organizations. Essentially, the GDPR determines how to deal with identifiable personal data. Privacy by Design now requires software and / or services for applications which have to focus on collecting the minimum amount of personal data for a particular purpose.

Many government agencies and other large organizations still have to anonymize their current documents with personal information (author name, teacher, etc.) in order to publish these files through their portal while protecting personal information. For example archives, books, and theses. But which information do you have to anonymise according to GDPR/DSGVO and how can you do this correctly and efficiently?
For completely anonymized data, the GDPR is no longer applicable. Anonymization can be used as a technique when an organization wants to keep the data for publication or statistical purposes but reducing it to individuals is no longer necessary or lawful. However, full anonymization is not easy. For each situation, it has to be checked which data has to be exchanged so that a natural person cannot be identified.

Solution: the DocYard Anonymizing Module

A new module has been developed for the DocYard document processing platform that allows customers to efficiently anonymise person-related data in bulk. Irreversibility is guaranteed. The anonymized personal data can no longer be traced or reproduced. Some examples: building permits, environmental permits, etc., which the public administration would like to make available online to citizens.

The modular structure of the DocYard software allows to perform anonymization either as part of the digitization process or as a separate processing of an existing digital archive. Due to the hardware-independent licensing, the production capacity can also be easily scaled up to the desired level (by additional module managers).

To anonymize an existing digital archive, a workflow can be configured and used in DocYard including the following process steps:

  1. Analysis of existing content and determination of which personal data in which documents (types) must be anonymized. However, it is also possible to submit the entire archive. In this case, the document type is selected based an automatic classification.
  2. Establish one or more form designs (i.e. rules) to automatically recognize personal information. In this way you can save the process for later usage which saves time and the change of failure.
  3. Import and configure the workflow in DocYard. Then start job and processing data.
  4. The documents (all formats are possible, such as Tiff, JPEG or PDF format) are read and the software determines the search criteria to be used for each document.
  5. Finally the software determines for each document which search criteria apply.
  6. The results of this search are displayed to the user in the validation module. The user can confirm the fields, cancel them or add more fields manually.
  7. The anonymization module performs two actions on the confirmed fields: graphically, the selected text is replaced by a black (or other color) bar, and the associated OCR text in the document is removed.
  8. Finally, the document is converted into a full-text searchable PDF/A file using Foxit MRC compression technology.
  9. Optionally, it is possible to write all anonymized fields per document to an audit file. This file can also be used if the fields have been pseudonymized and the original and alias are defined.

Result: By setting up smart workflows with the DocYard software, organizations can irreversibly anonymize large amounts of documents and data in an easy way. All person-related data that is processed will be removed which guarantees compliance with the new European Law GDPR (DSGVO).

Brochure