Alquimia Documentation
  • Welcome To Alquimia
  • Architecture
  • Operational Handler
  • Alquimia Runtime Helm Chart
  • UI SDK
    • Alquimia-AI Tools
      • Summary
      • getting-started
        • Getting Started with Alquimia Tools
    • Alquimia-AI UI
      • Summary
      • api-reference
        • API Types Reference
      • components
        • Atom Components
        • Molecules
        • Organism Components
        • Components Overview
      • examples
        • Examples
      • getting-started
        • Installation
      • styling
        • Styling Guide
        • Using Custom Themes with Tailwind CSS
    • Alquimia AI Widget
Powered by GitBook
On this page
  • Reactive Serverless
  • Table of Contents
  • Inference Request
  • Normalization
  • Sequence of Thinking
  • Generative Response
  • Response Transformers
  • Source Connectors
  • Observability
  • Server Tool Calling
  • External Knowledge Base
  • Output
  • Benefits

Architecture

PreviousWelcome To AlquimiaNextOperational Handler

Last updated 2 months ago

Reactive Serverless

The Alquimia runtime is a reactive serverless platform that is built on Knative and Kubernetes. It is designed to be a scalable, flexible, and efficient platform for building AI-driven workflows.

Table of Contents

This Reactive Serverless Architecture is designed for enterprise-grade automation, offering scalability, flexibility, and cost efficiency. Its ability to process requests, generate context-aware responses, and deliver them across multiple channels makes it a powerful solution for modern AI-driven applications. The focus on observability and extensibility ensures that the system remains transparent, adaptable, and future-proof.

Inference Request

  • What it is: This is the entry point of the system where user queries or requests are captured from various sources and channels.

  • Key components:

    • Main Internet: Requests from general web applications or interactive platforms.

    • Local Internet: Requests from localized or internal networks.

    • Email: Requests from email clients.

    • Voice Calls: Requests from voice call systems.

    • WhatsApp: Requests from WhatsApp.

    • Slack: Requests from Slack.

  • Purpose: To collect and categorize user queries from multiple communication channels (e.g., chat, email, video calls) and prepare them for processing.

  • How it works: The system identifies the source of the request and formats it into a standardized structure for further analysis.

Normalization

  • What it is: The process of transforming raw data (user queries) into a format that the system's models can understand and process.

  • Purpose: To ensure that the data is clean, contextualized, and ready for the next stages of processing.

  • How it works: The system uses advanced AI models to preprocess the query, extract key information, and enrich it with context.

Sequence of Thinking

  • What it is: A collaborative process where multiple deep-learning models work together to analyze the query and generate a response.

  • Key Components:

    • Parallel or Sequential Execution: Models can run in parallel (for speed) or sequentially (for dependency-based tasks) depending on the context. Multiple deep-learning models work together (in parallel or sequence) to analyze the query and gather relevant data.

    • Context-Aware Processing: The system uses the query and its context to determine the best approach for generating a response.

  • Purpose: To ensure that the system's response is accurate, relevant, and contextually appropriate.

  • How it works: The models analyze the query, retrieve relevant data, and prepare it for the generative response stage.

Generative Response

  • What it is: The stage where the system generates a response to the user's query using a Language Model (LAM).

  • Key Components:

    • LAM Profile Selection: The system selects the most suitable language model (on-person or in-cloud) based on the query and context.

    • Context: The response is generated using prompts that are aware of the query's context.

    • Adjacent Memory Management:: The system uses memory to ensure the response is coherent and consistent with previous interactions.

  • Purpose: To provide a human-like, contextually relevant response to the user's query.

  • How it works: The selected LAM generates a response based on the query, context, and memory of past interactions.

Response Transformers

  • What it is: The process of transforming the system's response into a format that matches the user's dialect, language, or preferred style.

  • Key Components:

    • Templify Rules: Standardizes the response format for consistency.

    • Response Generation: Creates the final output in the desired format.

    • Memory Persistence: Stores relevant data for future interactions.

  • Purpose: To ensure the response is personalized, user-friendly, and consistent with the user's expectations.

  • How it works: The system adapts the response to match the user's language, tone, and communication style.

Source Connectors

  • What it is: The stage where the system delivers the response back to the original channel (e.g., chat, email, video call).

  • Key Components:

    • Text Transformers: Adapt the response for different communication channels (e.g., WhatsApp, Slack, Email).

  • Purpose: To ensure the response is delivered seamlessly across multiple platforms.

  • How it works: The system routes the response to the appropriate channel and formats it for delivery.

Observability

  • What it is: The process of monitoring and logging the system's performance and behavior.

  • Key Components:

    • Metrics: Tracks the system's performance and behavior.

    • Logs: Logs the system's performance and behavior.

    • Traces: Traces the system's performance and behavior.

Server Tool Calling

  • What it is: The system's ability to integrate with external APIs and tools for additional functionality.

  • Key Components:

    • External API: Connects to third-party services for enhanced capabilities.

    • Custom Run: Executes custom scripts or processes as needed.

  • Purpose: To extend the system's functionality by leveraging external resources.

  • How it works: The system calls external APIs or runs custom scripts to perform specific tasks.

External Knowledge Base

  • What it is: A repository of external data that the system can access to enhance its responses.

  • Key Components:

    • Graphs: Structured data for relationships and connections.

    • Databases: Stores and retrieves information.

    • Embeddings: Semantic representations of data for better understanding.

  • Purpose: To provide the system with additional knowledge and context for generating accurate responses.

  • How it works: The system queries the knowledge base to retrieve relevant information for the response.

Output

  • What it is: The final output of the system, which is the response to the user's query.

  • Purpose: To provide a response that is accurate, relevant, and context-aware.

  • How it works: The system formats the response into a standardized structure for further processing and delivery.

Benefits

  • Scalability: The serverless architecture ensures the system can handle high workloads efficiently.

  • Flexibility: Modular components allow for easy integration with external tools and services.

  • Cost Efficiency: Scale-to-zero and dynamic resource allocation minimize costs.

  • Personalization: Context-aware processing and response transformers ensure user-friendly interactions.

  • Observability: Real-time monitoring provides transparency and enables quick troubleshooting.

  • Extensibility: Integration with external APIs and knowledge bases enhances the system's capabilities.

Inference Request
Normalization
Sequence of Thinking
Generative Response
Response Transformers
Source Connectors
Observability
Server Tool Calling
External Knowledge Base
Output
Benefits
Reactive Serverless Architecture