Narcis.ai: Production ML Systems Engineering

Live System: narcis.ai

Philosophy: Production systems built through deep understanding and minimal complexity

Production Example

Production face identity transformation: Byzantine emperor artistic interpretation generated by the narcis.ai system

A Personal Account of Experience

“Understand deeply, implement minimally”

Master the fundamentals to build the simplest, most natural solution. Each implementation, feature delivery, bug resolution, and incident response deepens technology mastery and strengthens platform capabilities over time. This means restraining the range of frameworks, technologies and languages to a manageable core. In startups, chasing new technologies is broadly accepted. But the truth is, it creates systems with features that are redundant among components that have isolated philosophies. This creates a nightmare to maintain, to understand the purpose of each component and it also scatters the team’s knowledge. Most of well established frameworks have enough subtleties, depth and side tooling to enable the full range of features to be implemented. It’s like having a workshop with so many tools that they’re scattered all over the place. That’s great to have a tool for every task, but not if one can’t find what he needs anymore and also uses so many at the same time that he doesn’t know how to handle them properly.

“Build locally, run remotely”

The large scale and high sensitivity of military-grade systems are such different constraints that they shifted how I saw development iterations. This is actually common when working on distributed systems : the scale of the infrastructure needed, even for the simplest processing function, forces one to think thoroughly before testing the code. But it comes with time gains: the code runs on the same CPU as it will in production, the same network configuration, the same APIs and already has all the tooling implemented to log, measure and store the production components. Rapid iteration with production fidelity kills the guess-work of «it works on my machine». And by rapid I mean that a couple of minutes is totally fine. Spamming the terminal for localhost iterations is dopamine chasing. Production reality comes eventually, then why not embrace it from the start?

Technical Journey: Three Deep Dives

Page 1: Container Orchestration & Service Mesh Architecture
Infrastructure foundations - ECS cluster orchestration, multi-tier GPU capacity providers, service mesh networking, and AWS resource management patterns that provide the reliability foundation for production ML workloads.

Page 2: ML Pipeline Engineering Deep Dive Technical implementation - Custom diffusion implementations, differential timestep scheduling, PhotoMaker identity preservation, and tensor operations beyond standard framework usage.

Page 3: Production Operations & Web Platform
Real-world deployment - Discord bot command interfaces, Remix React web platform, PostgreSQL session management, and unified caching systems that power the live user-facing application.

Supporting Materials: Technical code examples and diagrams

Production System Highlights

Infrastructure Engineering (Page 1)

Container Orchestration: ECS Service Connect with 198 Terraform resources managing multi-tier GPU capacity
Cost Optimization: Multi-tier spot instance strategies with automatic failover (G6→G5→G4dn)
Development Velocity: 2-3 minute deployment cycles from local development to production GPU environments

ML Pipeline Engineering (Page 2)

Custom Diffusion: SDXL + PhotoMaker V2 integration with differential timestep scheduling
Identity Preservation: Multi-million parameter face identity injection through progressive masking
Tensor Operations: Batch processing and CFG optimization beyond standard frameworks

Production Operations (Page 3)

Live User System: Publicly available at narcis.ai serving artistic transformations
Discord Operations Interface: Real-time ML workflow control with parameter tuning and batch generation
Generation Performance: ~20 seconds on G6/G5 instances, ~40 seconds on G4dn spot instances
Remix Web Platform: PostgreSQL session management and unified caching for instant image loading

Two-Stage Generation Pipeline

Complete generation pipeline: From 2-word user input to final artistic portrait via prompt agent and face diffusion

Two-stage approach: Complete separation of artistic composition from face identity for interpretable, debuggable AI:

Stage 1: Pure SDXL text-driven generation without face conditioning
Stage 2: Differential diffusion with progressive masking for identity injection

Live Production Web Interface

PhotoWall Gallery Interface

Multi-column infinite scroll portrait gallery showing AI-generated artistic transformations with real-time updates

The production system at narcis.ai embodies these three principles in practice: understand deeply through constrained technology choices (ECS, PyTorch, Remix); build locally, run remotely with 2-3 minute deploy cycles from laptop to production GPU;

→ Read the complete technical writeup to see these principles applied across infrastructure, ML engineering, and production operations.