<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://gaetanbervet.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://gaetanbervet.com/" rel="alternate" type="text/html" /><updated>2026-06-26T15:32:24+00:00</updated><id>https://gaetanbervet.com/feed.xml</id><title type="html">Gaëtan Bervet</title><subtitle>Working notes on production machine-learning systems.</subtitle><author><name>Gaëtan Bervet</name></author><entry><title type="html">Narcis.ai: Production ML Systems Engineering</title><link href="https://gaetanbervet.com/2025/08/15/narcis-ai-production-ml-systems-engineering.html" rel="alternate" type="text/html" title="Narcis.ai: Production ML Systems Engineering" /><published>2025-08-15T00:00:00+00:00</published><updated>2025-08-15T00:00:00+00:00</updated><id>https://gaetanbervet.com/2025/08/15/narcis-ai-production-ml-systems-engineering</id><content type="html" xml:base="https://gaetanbervet.com/2025/08/15/narcis-ai-production-ml-systems-engineering.html"><![CDATA[<table>
  <tbody>
    <tr>
      <td><strong>Live System</strong>: <a href="https://narcis.ai">narcis.ai</a></td>
      <td><strong>Philosophy</strong>: Production systems built through deep understanding and minimal complexity</td>
    </tr>
  </tbody>
</table>

<p><img src="/assets/main_production_example.png" alt="Production Example" width="400" /></p>

<p><em>Production face identity transformation: Byzantine emperor artistic interpretation generated by the narcis.ai system</em></p>

<h3 id="a-personal-account-of-experience">A Personal Account of Experience</h3>

<h4 id="understand-deeply-implement-minimally">“Understand deeply, implement minimally”</h4>
<p><strong>Master the fundamentals to build the simplest, most natural solution.</strong>
Each implementation, feature delivery, bug resolution, and incident response deepens technology mastery and 
strengthens platform capabilities over time.
This means restraining the range of frameworks, technologies and languages to a manageable core.
<strong>In startups, chasing new technologies is broadly accepted.</strong>
But the truth is, it creates systems with features that are redundant among components that have isolated 
philosophies.
This creates a nightmare to maintain, to understand the purpose of each component and it also scatters the 
team’s knowledge.
Most of well established frameworks have enough subtleties, depth and side tooling to enable the full range 
of features to be implemented.
<strong>It’s like having a workshop with so many tools that they’re scattered all over the place.</strong>
That’s great to have a tool for every task, but not if one can’t find what he needs anymore and also uses so 
many at the same time that he doesn’t know how to handle them properly.</p>

<h4 id="build-locally-run-remotely">“Build locally, run remotely”</h4>
<p>The large scale and high sensitivity of military-grade systems are such different constraints that they shifted 
how I saw development iterations.
This is actually common when working on distributed systems : the scale of the infrastructure needed, even for
the simplest processing function, forces one to think thoroughly before testing the code.
But it comes with time gains: the code runs on the same CPU as it will in production, the same network 
configuration, the same APIs and already has all the tooling implemented to log, measure and store the 
production components.
<strong>Rapid iteration with production fidelity kills the guess-work of «it works on my machine».</strong>
And by rapid I mean that a couple of minutes is totally fine.
<strong>Spamming the terminal for localhost iterations is dopamine chasing.</strong>
Production reality comes eventually, then why not embrace it from the start?</p>

<h2 id="technical-journey-three-deep-dives">Technical Journey: Three Deep Dives</h2>

<p><strong><a href="/README#page-1-container-orchestration--service-mesh-architecture">Page 1: Container Orchestration &amp; Service Mesh Architecture</a></strong><br />
<em>Infrastructure foundations</em> - ECS cluster orchestration, multi-tier GPU capacity providers, service mesh networking, and AWS resource management patterns that provide the reliability foundation for production ML workloads.</p>

<p><strong><a href="/README#page-2-ml-pipeline-engineering-deep-dive">Page 2: ML Pipeline Engineering Deep Dive</a></strong>
<em>Technical implementation</em> - Custom diffusion implementations, differential timestep scheduling, PhotoMaker identity preservation, and tensor operations beyond standard framework usage.</p>

<p><strong><a href="/README#page-3-production-operations--web-platform">Page 3: Production Operations &amp; Web Platform</a></strong><br />
<em>Real-world deployment</em> - Discord bot command interfaces, Remix React web platform, PostgreSQL session management, and unified caching systems that power the live user-facing application.</p>

<hr />

<p><strong>Supporting Materials</strong>: <a href="/technical-materials">Technical code examples and diagrams</a></p>

<h2 id="production-system-highlights">Production System Highlights</h2>

<p><strong>Infrastructure Engineering</strong> (Page 1)</p>
<ul>
  <li><strong>Container Orchestration</strong>: ECS Service Connect with 198 Terraform resources managing multi-tier GPU capacity</li>
  <li><strong>Cost Optimization</strong>: Multi-tier spot instance strategies with automatic failover (G6→G5→G4dn)</li>
  <li><strong>Development Velocity</strong>: 2-3 minute deployment cycles from local development to production GPU environments</li>
</ul>

<p><strong>ML Pipeline Engineering</strong> (Page 2)</p>
<ul>
  <li><strong>Custom Diffusion</strong>: SDXL + PhotoMaker V2 integration with differential timestep scheduling</li>
  <li><strong>Identity Preservation</strong>: Multi-million parameter face identity injection through progressive masking</li>
  <li><strong>Tensor Operations</strong>: Batch processing and CFG optimization beyond standard frameworks</li>
</ul>

<p><strong>Production Operations</strong> (Page 3)</p>
<ul>
  <li><strong>Live User System</strong>: Publicly available at <a href="https://narcis.ai">narcis.ai</a> serving artistic transformations</li>
  <li><strong>Discord Operations Interface</strong>: Real-time ML workflow control with parameter tuning and batch generation</li>
  <li><strong>Generation Performance</strong>: ~20 seconds on G6/G5 instances, ~40 seconds on G4dn spot instances</li>
  <li><strong>Remix Web Platform</strong>: PostgreSQL session management and unified caching for instant image loading</li>
</ul>

<h2 id="two-stage-generation-pipeline">Two-Stage Generation Pipeline</h2>

<p><img src="/assets/generation_process.png" alt="Two-Stage Generation Pipeline" width="600" /></p>

<p><em>Complete generation pipeline: From 2-word user input to final artistic portrait via prompt agent and face diffusion</em></p>

<p>Two-stage approach: <strong>Complete separation of artistic composition from face identity</strong> for interpretable, debuggable AI:</p>
<ul>
  <li><strong>Stage 1</strong>: Pure SDXL text-driven generation without face conditioning</li>
  <li><strong>Stage 2</strong>: Differential diffusion with progressive masking for identity injection</li>
</ul>

<h2 id="live-production-web-interface">Live Production Web Interface</h2>

<p><img src="/assets/photowall.png" alt="PhotoWall Gallery Interface" width="600" /></p>

<p><em>Multi-column infinite scroll portrait gallery showing AI-generated artistic transformations with real-time updates</em></p>

<p>The production system at <a href="https://narcis.ai">narcis.ai</a> embodies these three principles in practice: 
<strong>understand deeply</strong> through constrained technology choices (ECS, PyTorch, Remix); 
<strong>build locally, run remotely</strong> with 2-3 minute deploy cycles from laptop to production GPU;</p>

<p><strong><a href="/README">→ Read the complete technical writeup</a></strong> to see these principles applied across infrastructure, ML engineering, and production operations.</p>]]></content><author><name>Gaëtan Bervet</name></author><summary type="html"><![CDATA[Production systems built through deep understanding and minimal complexity — infrastructure foundations, a two-stage diffusion pipeline, and what it takes to run face-identity generation live.]]></summary></entry></feed>