Overview

Transac AI

Transac AI project is geared towards generation of enriched summaries and insights from complex transactional data in real-time or batch using Generative AI and Large Language Models (LLMs).

Imagine fine-tuned LLMs serving as a specialists working several times more efficiently than multiple human analysts, and capable of analyzing hundreds of transactions per second to generate insights and ensure compliance, risk management, and fraud detection; all in real-time. This is what Transac AI aims to achieve.

Overview

Let's start with a brief overview of the problem Transac AI is trying to solve and how it aims to solve it.

The Problem

Real-time and Scalable Insights Extraction in Natural Language.

The problem: Conventionally, sophisticated data analysis and processing tools are used to process vast amounts of transactional data to generate visualizations, reports, and metrics, which are then evaluated by human decision-makers and experts to manually extract relevant insights to be able to make informed decisions. This process is time-consuming, error-prone, and inefficient. For one, creating a real-time system with this architecture is a challenge since it involves a lot of moving pieces and complex integrations with manual interventions, which can lead to errors and inconsistencies. Moreover, scaling such a system is a challenge since the ultimate insights extraction is dependent on the human evaluators of the analyzed data.

The Solution

Transac AI.

The solution: Transac AI is a real-time and scalable system that uses Generative AI and Large Language Models (LLMs) to automatically generate enriched summaries and insights from complex transactional data.

When analyzing transactional data, generally the ultimate goal is to generate valuable insights that can help in tasks like decision-making, compliance, risk management, fraud detection, etc. Usually, the analysis of such data is used by a human expert at the end of the pipeline to perform a certain action. For example, based on the spending patterns of employees, a manager may choose to edit the budgeted amounts for different categories or follow-up with an employee over breach of a compliance policy. This is where Transac AI comes in.

Instead of passing the transactional data through cycles of data analysis and visualization tools, Transac AI uses Large Language Models (LLMs) to generate insights directly from the raw transactional data in real-time and in natural language optimized for human understanding. It takes away the onus of extracting relevant insights from processing data from human evaluators and automates the process using AI and specialized LLMs.

Features

Some of the core features of Transac AI include:

  • Real-time Insights Generation: Transac AI is capable of generating insights from transactional data in real-time.
  • Scalable: Transac AI is built using a highly scalable, available, and reliable Kubernetes-based distributed microservices & event-driven architecture.
  • Generative AI: Transac AI uses Generative AI and Large Language Models (LLMs) to generate insights from transactional data in natural language.
  • Low-latency and High-throughput: Transac AI uses low-latency and high-throughput inter-connections between services using gRPC, Apache Kafka, Connect RPC, and GraphQL.
  • Automated Compliance, Risk Management, and Fraud Detection: Transac AI can be used to automate compliance, risk management, and fraud detection tasks using sophisticated prompts and fine-tuned LLMs.

Use Cases

Transac AI can be used in a variety of use cases including:

  • Compliance Automation: Automate compliance tasks using real-time insights generated by Transac AI.
  • Risk Management: Use Transac AI to generate insights that can help in risk management tasks.
  • Fraud Detection: Automate fraud detection tasks using Transac AI.
  • Decision-making: Use Transac AI to generate insights that can help in decision-making tasks.
  • Spending Analysis: Analyze spending patterns using Transac AI to generate insights that can help in budgeting and forecasting.
  • Employee Monitoring: Monitor employee spending patterns using Transac AI to generate insights that can help in compliance and risk management.

Architecture

Transac AI is built using a highly scalable, available, and reliable Kubernetes (opens in a new tab)-based distributed microservices & event-driven architecture. This is enabled through the use of low-latency and high-throughput inter-connections between services using gRPC (opens in a new tab), Apache Kafka (opens in a new tab), Connect RPC (opens in a new tab), and GraphQL (opens in a new tab).

Core services are built using Python, Node.js, and Go, and are deployed and managed efficiently using Terraform (opens in a new tab) on Google Cloud (opens in a new tab) and AWS (opens in a new tab) using services like the Google Kubernetes Engine (GKE) (opens in a new tab), AWS EventBridge Scheduler (opens in a new tab), AWS Lambda (opens in a new tab), Confluent Kafka (opens in a new tab), Supabase (opens in a new tab), and more.

The below given image shows the flow of generating insights from transactional data and provides a high-level overview of the underlying distributed event-driven architecture of Transac AI.

Transac AI - Insights Generation & Overall Architecture

Transac AI - Insights Generation & Overall Architecture

Microservices

Transac AI is built using a set of core microservices that work together to generate insights from transactional data in real-time or in batch.

All of these core services were containerized using Google Cloud Build (opens in a new tab) and deployed on the Google Kubernetes Engine (GKE) (opens in a new tab) with their images stored in the Google Artifact Registry (GCR) (opens in a new tab).

Let's take a look at some of the core services and the role they play in the system.

Workload Manager Service (WMS)

The Workload Manager Service (WMS) serves at the forefront of the insights generation pipeline. Every new request for generating insights from transactional data is first received by the WMS, which then records the request for tracking and monitoring purposes, and passes it on to the Insights Generation Service (IGS) for further processing.

As is evident, this service needs to be highly available and scalable to handle a large number of incoming requests and ensure that they are processed in a timely manner. Therefore, the WMS provides a Go language-based Connect RPC server that is deployed on a Kubernetes cluster in the Google Kubernetes Engine (GKE) to ensure high availability and scalability.

WMS supports multiple HTTP protocol versions like HTTP 1.1, HTTP/2, and HTTP/3 for receiving requests from clients. This allows WMS to interact with other Transac AI micro-services using gRP (opens in a new tab) for low-latency and high-throughput communication, while allowing web or mobile-based front-ends to submit requests using HTTP or Connect Protocol (opens in a new tab).

To learn more about WMS, check the WMS GitHub Repository (opens in a new tab).

Requests Storage Service (RSS)

The Requests Storage Service (ISS) is a GraphQL API service that stores and retrieves insights generation requests.

All new insights generation requests are started from the Workload Manager Service (WMS). On getting a new manual request for insights generation or on starting a new scheduled job, the WMS service first stores the request details through the RSS service to get the request_id. For manual requests, it also returns this ID to the frontend client. Throughout the rest of the insights generation process cycle, this request_id is used for tracking insights generation request status, for tracing requests, for observability, and for performance monitoring and analysis.

Frontend clients can also directly interact with the RSS service to get request details like status of a certain request or all requests for a certain client.

RSS service is secured through API keys. Each frontend client can be assigned a unique API key to allow access and enable usage tracking and monitoring.

RSS is written in TypeScript using the Prisma Client ORM (opens in a new tab) and the GraphQL Apollo Server (opens in a new tab), and is deployed in the same Kubernetes cluster as the WMS service in GKE to ensure low-latency communication between the two services with an internal load balancer.

To learn more about RSS, check the RSS GitHub Repository (opens in a new tab).

Task Scheduler Service (TSS)

To allow for the ability to setup batch jobs that run at specific intervals, the Task Scheduler Service (TSS) was created. The TSS service is responsible for scheduling and running batch jobs for insights generation. For example, a client may setup a job to generate insights every 10 minutes. Setting up such intervals help ensure that the number of records to be analyzed by the Large Language Model (LLM) is optimal while ensure near real-time insights generation.

TSS is essentially a Node.js and Fastify-based REST API service written in TypeScript that is deployed on the same Kubernetes cluster as the WMS service in GKE. It interacts with a Supabase-hosted PostgreSQL database to store job schedules and job statuses.

Insights Generation Service (IGS)

The Insights Generation Service (IGS) is the core service responsible for generating insights from transactional data using Generative AI and Large Language Models (LLMs).

Core functionalities of the IGS are:

  1. Receives a request through gRPC from the Workload Manager Service (WMS) to generate insights from transactional data.
  2. Uses gRPC to communicate with the Prompt Builder Service (PBS) to get the prompt to be used to generate insights. This prompt also contains the transactional data to be analyzed.
  3. Generates insights by passing the prompt to the Large Language Model (LLM).
  4. Save the generated insights in the database using the Insights Storage Service (ISS).
  5. Send update on Kafka topic to inform active clients about the generated insights.

In the Insights Generation section below we will discuss the insights generation process in more detail.

IGS is essentially a Connect RPC (opens in a new tab) server written in TypeScript. It is deployed in the same Kubernetes cluster as the WMS, PBS, and ISS services in the GKE to ensure low-latency communication between the services. Since, IGS only interacts with internal services, it is not accessible from outside the Kubernetes cluster.

To learn more about IGS, check the IGS GitHub Repository (opens in a new tab).

Prompt Builder Service (PBS)

The Prompt Builder Service (PBS) is one of the core services of TransacAI. This service is responsible for preparing the prompt that will then be used by the Insights Generation Service (IGS) to generate insights from transactional data.

The PBS is a gRPC service that takes in a request with the following information:

  • req_id: A unique identifier for a request for idempotency, traceability, and debugging.
  • client_id: A unique identifier for a client.
  • prompt_id: ID of the prompt template to be used for generating insights. This allows the client to use different templates for different types of insights and data.
  • records_source_id: ID of the source of the transactional data records. This allows the client to use different sources of transactional data for generating insights, and also helps improve resiliency by allowing the client to use different sources in case one source is down.
  • prompt_templates_source_id: ID of the source of the prompt templates. This allows the client to use different sources of prompt templates for generating insights, and also helps improve resiliency by allowing the client to use different sources in case one source is down.
  • from_time: Start time of the transactional data to be used for generating insights.
  • to_time: End time of the transactional data to be used for generating insights.

The PBS then prepares the prompt by fetching the prompt template from the database, filling in the placeholders in the template with the transactional data, and returning the prompt to the client.

The PBS is written in Python using the gRPC Python library (opens in a new tab). It is deployed in the same Kubernetes cluster as the IGS service to ensure low-latency communication between the services.

To learn more about PBS, check the PBS GitHub Repository (opens in a new tab).

Insights Storage Service (ISS)

The Insights Storage Service (ISS) is a GraphQL (opens in a new tab) API service that stores and retrieves generated insights.

The Insights Generation Service (IGS) generates insights from transactional data and sends them to ISS for storage. ISS stores these insights in a PostgreSQL database and provides an API for querying and retrieving them. Frontend clients can use the ISS GraphQL API to fetch insights.

ISS API is publicly accessible but is secured through the use of API keys that is required for all requests.

Like the RSS service, the ISS service is written in TypeScript using the Prisma Client ORM (opens in a new tab) and the GraphQL Apollo Server (opens in a new tab), and is deployed in the same Kubernetes cluster as the IGS service to ensure low-latency communication between the two services.

To learn more about ISS, check the ISS GitHub Repository (opens in a new tab).

Real-time Updates

Transac AI provides real-time updates to clients about the status of insights generation requests and the generated insights.

To achieve this, Transac AI uses Apache Kafka (opens in a new tab) as a message broker to publish updates about insights generation requests and the generated insights. The Insights Generation Service (IGS) publishes updates about insights generation requests and the generated insights to the new_insights Kafka topic. All active clients subscribed to this topic receive real-time updates about the generated insights.

The Workload Manager Service (WMS) also subscribes to the new_insights Kafka topic to receive updates about the generated insights, and update request status accordingly through the Requests Storage Service (RSS).

Clients can use the RSS service to manually get the status of requests. All fulfilled requests will have an associated insights_id, which the client can use to fetch generated insights from the Insights Storage Service (ISS).