How we provided software development services for the e-discovery project
About the client
ZyLAB is an e-discovery software development company, established in 1983.
How it works
ZyLAB One is an innovative SaaS platform that helps automate the eDiscovery process. It provides easy-to-use, yet advanced search capabilities, automated reductions, and powerful analytics that leverages artificial intelligence.
The platform works like this:
- The user uploads all kinds of data to the ZyLAB One environment with a standard browser or connects directly to the locations where documents reside (Microsoft Office 365 or Google Workspace).
- The platform processes all information – OCRs image files analyzes email conversations, extracts compound formats, etc., to make all the information available for investigation or review.
- The platform leverages structural and semantic insights and presents the information to the user to make strategic decisions based on facts.
- The dashboard provides insight into the collected and processed data for early case assessment or a starting point for the more thorough data investigation.
- The user can preview the result in over 750 formats and quickly navigate through the document, and tag or label them.
- The user can save the reviews data in various formats, including Concordance or EDRM XML.
Client Business challenges
Initially, ZyLab software was developed as a desktop application. The company was no longer satisfied with the system’s performance. The client contacted the IntelliSoft team as the company needed to extend the R&D team with experienced dedicated developers.
What we did
Legacy platform migration
The previous product, on-premise ZyLAB IM Platform (Information Management Platform) for extensive data analysis, was run on the ZyLab clients’ servers.
It allows the user to select different resources for data collection, extract and parse it, turn it into text, index data, and make it available for keyword search.
One example is a company with a lot of data to store for the investigation, which is located in different sources and not structured. The software allows taking data from images, videos, and sounds and turning it into text.
The initial desktop platform included many components and leveraged many services for Big data analysis. With time, those components become outdated.
By the time we joined the project, the in-house development team had been working on the platform split for 1 year and 6 months. They separated the core functionality from the desktop app, adopted the old code, and integrated it into the new SaaS web-based product, ZyLab One, to streamline the implementation of the new functionality. The IntelliSoft team was responsible for the project infrastructure for proper code deployment and testing.
The client wanted us to make ZyLAB One work more effectively. To achieve this goal, we are changing the relationship between services to make them more independent.
We are wrapping the app’s functionality into docker containers and Kubernetes clusters to avoid software dependency from the environment.
Docker containers use OS-level virtualization to deliver software in packages called containers. Containers are isolated and bundled software, libraries, and configuration files. Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and managing containerized applications.
One example is the optical character recognition (OCR) component that requires libraries and trained neural network models to recognize characters.
When you set the ZyLAB One up to the new server, you also need to add models, dependencies, libraries, etc. so that service recognizes characters.
Contetization makes the software environment-agnostic and allows keeping all the necessary materials in one place. When the user needs to install the service to the new Microsoft Azure server, one deploys one container that includes all the required dependencies instead of deploying separate files.
In this case, Kubernetes allows managing separate files with dependencies on different servers. The user can simultaneously run several containers on cloud computing. If the server’s capacity is insufficient, the user can leverage one more server for a container.
Tesseract OCR Integration
The client’s software used ABBYY optical character recognition solution for the image-to-text feature and paid fees according to the subscription plan. The client’s in-house team integrated an optional Tesseract 3, the open-source character recognition engine, to reduce operating costs.
The IntelliSoft developers upgraded the Tesseract 3 version to the Tesseract 5 version, which has better performance and created the initial implementation of the image pre-processing pipeline to improve OCR quality.
The team has built an automated quality comparison tool that helps to fine-tune the preprocessing algorithms and currently developers prepare the second pipeline version to make Tesseract OCR the default choice for OCR.
Kubernetes also allows launching several servers with containers when the user uploads a massive amount of data to the queue. Kubernetes dynamically allocates computing resources of containers among clusters and improves the service’s performance. Once the queue is empty, Kubernetes turns off extra servers so that users don’t need to pay for resources they don’t use.
Another Kubernetes feature is the advanced isolation environment layer. Suppose many users upload their data for the analysis to the platform. Kubernetes eliminates cases when one user receives another user’s analyzed data and system failure due to broken file upload.
Moreover, such a system is much easier and safer to deploy. If the user has problems deploying the container’s new version, one can stop its deployment and back up the container with the old version.
In this way, the solution eliminates downtime. In contrast, when using an on-premise environment, the user deletes the old software version to install the new version. If the software’s latest version crushes, the user deletes the newest version and then installs the old version, which causes downtime.
The IntelliSoft team implemented Proof of concept (POC) for containerized processing node service and now continues work on containerization of other parts of the product. We also consider migrating all services to containers under the Kubernetes cluster soon.
Project tech stack
Results we achieved
The client has had high standards for developers since the product processes high amounts of data, has a scalable complex infrastructure, and has high standards of security. Thus, it was challenging to find specialists with the expertise to match those requirements.
The IntelliSoft team has completed the client’s requirements and gathered a team of 6 dedicated software specialists working on the project for about a year.
The project is ongoing and we have a large tech track planned, so we are continuing work on improvements to make ZyLab one even more scalable, robust, feature-rich, and cost-effective solution.
At the end of 2021 ZyLab decided to broaden the scope of cooperation and Intellisoft is going to create a new team to work on the ZyLab Insights product.