Cisco's AI Ecosystem Leverages Google Cloud for Multi-Regional Microservices

May 1, 2026 · Juan Netapp · 5 min read

This video from Outshift by Cisco covered a lot of ground. 7 segments stood out as worth your time. Everything below links directly to the timestamp in the original video.

Understanding this architecture reveals how major enterprises are building resilient, scalable AI platforms capable of integrating diverse data sources and supporting a wide array of intelligent agents.

Cisco's AI Ecosystem Leverages Google Cloud for Multi-Regional Microservices

Cisco's AI ecosystem is built predominantly on Google Cloud Platform (GCP), utilizing microservices deployed across multiple regions in the U.S. and Asia. The architecture incorporates an Anthos service mesh to ensure failover and fault tolerance, while standardized Apache Spark jobs act as connectors, pulling data from both on-premise and cloud enterprise sources. This content is then uploaded to Google Cloud Storage and processed via Pub/Sub, with Google DataProc and Dataflow handling job execution and data processing.

The system includes ingestion and retrieval services, alongside various AI agents designed for tasks such as generating PowerPoint presentations with Cisco-approved templates, conducting deep research, and performing data analysis on uploaded CSV and Excel files. Administration services manage an API portal and agent registry, with core services like LLM as a service and RAG services integrated. All transactions are audited in BigQuery, and the platform incorporates AI defense mechanisms for security and connects to Cisco's deep network model and on-premise agents for seamless, secure enterprise-wide AI capabilities.

▶ Watch this segment — 22:20

Cisco Implements LangGraph-Based RAG for Enhanced Answer Generation

Cisco has implemented a Retrieval-Augmented Generation (RAG) system utilizing LangGraph, where an agent first generates multiple SEO queries from a user prompt using a Large Language Model (LLM). These queries are then simultaneously sent to both keyword and vector search engines. The results from both search avenues undergo a multi-stage reranking process, initially by a neural reranker deployed on an L4 GPU node in GCP, followed by another level of reranking to select the most relevant data chunks.

The top-ranked chunks are subsequently fed into a GPT-4 mini LLM to generate the final answer, prioritizing low latency with a reported time-to-first-token of 4 to 6 seconds. The entire process, from prompt to final answer, is meticulously audited, with every data point and interaction logged via Pub/Sub and stored in BigQuery for debugging and traceability purposes, ensuring robust and transparent AI operations.

▶ Watch this segment — 18:43

Automated Infrastructure Provisioning Powers Cisco's RAG as a Service

Cisco's RAG as a Service (RaaS) automates the provisioning of critical infrastructure components on Google Cloud Platform (GCP) across multiple regions in the U.S. and Asia. Key components include Bigtable for low-latency, high-availability metadata storage, Octa for secrets management, and Google Cloud Storage buckets for data. Google Deployment Manager facilitates the automated creation of Pub/Sub topics and associated service accounts.

For search capabilities, Elasticsearch clusters are provisioned for keyword search, with routing determined by data volume, and Pinecone serverless is used for vector database storage, including namespace creation for semantic content. A suite of microservices is also provisioned per client to manage data processing, with users receiving automated notifications containing API documentation, credentials, and endpoints once their infrastructure is ready.

▶ Watch this segment — 17:28

Cisco's RAG System Features Auditable Content Ingestion via Google Cloud Storage

Cisco's Retrieval-Augmented Generation (RAG) system incorporates a robust content ingestion process that allows users to upload content to Google Cloud Storage buckets using short-lived service tokens. Upon upload, the reference URL of the content is sent to Pub/Sub, triggering a series of microservices. These microservices are responsible for processing the documents, including text extraction, chunking, and the generation of embeddings.

Subsequently, the processed content is indexed into both Elasticsearch for keyword search and Pinecone for vector search. A critical aspect of this ingestion pipeline is its comprehensive auditability. Every document and every interaction with microservices generates an audit event, providing end-to-end traceability that allows users to query BigQuery to monitor the status and history of any particular document within the system.

▶ Watch this segment — 20:08

Cisco Offers LLM as a Service with Standardized API and Tiered Access

Cisco provides an "LLM as a Service" platform, granting its employees access to various Large Language Models (LLMs) through a standardized API. This API allows employees to develop Generative AI (GenAI) agents and integrate them into their embedded applications. The service features both a free tier for initial development and innovation, which includes rate limits, and a premium tier that offers unlimited access once applications are production-ready.

The standardized API layer facilitates easy switching between different LLM models, such as Gemini or Anthropic, by simply modifying the model ID within the API call. An API gateway enforces token-based and request-based rate throttling, while an API portal enables users to monitor cost usage and view metrics. Over the past year, this service has processed 325 million LLM requests, involving 1.45 trillion tokens, and is utilized by approximately 6,200 teams across Cisco.

▶ Watch this segment — 14:50

Cisco Rolls Out No-Code 'Agent as a Service' for Automated Agent Creation

Cisco has introduced an innovative "Agent as a Service" offering, a no-code agentic tool designed to simplify the creation and deployment of AI agents. Users define their agent requirements, which are then clarified through an interactive interview process. Once requirements are finalized, the system moves to a design phase, presenting a LangGraph architecture for approval. Following this, code is automatically generated, undergoes an automated review by another agent that identifies and fixes errors, and is then deployed to Google Cloud Platform (GCP).

This service allows users to visualize LangGraph nodes and interact with the agents through a chat interface. The platform aims to empower anyone within the enterprise to create agents by connecting to existing Multi-Cloud Platform (MCP) servers registered within Cisco's ecosystem. This centralized registry of MCP servers is intended to foster significant innovation by making agent development accessible to a broader audience.

▶ Watch this segment — 21:05

Cisco's Circuit AI Tool Achieves Widespread Adoption, Processes 125,000 Daily Prompts

Circuit, Cisco's primary AI tool, has garnered significant internal adoption, now serving over 100,000 users who collectively generate an average of 125,000 prompts daily. Users have reported high levels of productivity with the tool. For enterprises seeking to implement similar AI platforms, key architectural layers are crucial: LLM observability for monitoring and auditing, FinOps for calculating token usage and chargebacks, and robust model hosting with capacity management to ensure sufficient TPM (tokens per minute) for all clients and assistants.

Further architectural considerations include built-in orchestration to intelligently route user prompts to the correct agents, a UI and app services layer, agentic and MCP services, a search engine layer, an AI-ready data layer with supporting data sources, and an ingestion layer for unstructured enterprise data. The platform also emphasizes a comprehensive security layer, integrating with Cisco AI defense, guardrails, and secure search capabilities, along with connections to external databases and remotely hosted MCP servers or agents.

▶ Watch this segment — 12:49

Also mentioned in this video

Summarised from Outshift by Cisco · 27:45. All credit belongs to the original creators. Streamed.News summarises publicly available video content.

Streamed.News

Convert your full video library into a digital newspaper.

Get this for your newsroom →

Cisco's AI Ecosystem Leverages Google Cloud for Multi-Regional Microservices

Cisco's AI Ecosystem Leverages Google Cloud for Multi-Regional Microservices

Cisco Implements LangGraph-Based RAG for Enhanced Answer Generation

Automated Infrastructure Provisioning Powers Cisco's RAG as a Service

Cisco's RAG System Features Auditable Content Ingestion via Google Cloud Storage

Cisco Offers LLM as a Service with Standardized API and Tiered Access

Cisco Rolls Out No-Code 'Agent as a Service' for Automated Agent Creation

Cisco's Circuit AI Tool Achieves Widespread Adoption, Processes 125,000 Daily Prompts

Also mentioned in this video

More from

Cisco Develops AI Agents for Troubleshooting and Compliance in Cisco IQ

AI Agent Automates Post-Outage Ticket Management