hackathonRegistration Soon

Conversation AI Hackathon ($10k Prize)

Prize Pool

$10,000 USD – awarded to top solutions

Who Can Enter

Open worldwide – software engineers, AI researchers, Indie developers

Submission Deadline

May 10, 2026

Format

Remote/online – Submit via Github + video demo

Judging

Proxa Labs engineering and product teams

Bonus Award

Invitation for future consulting to continue build out of platform and public recognition and portfolio features on Proxa Labs marketing

Background and Problem Statement

Proxa Echo is an AI-powered roleplay and certification platform built for pharmaceutical and life sciences sales organizations. It enables sales representatives to practice high-stakes conversations with simulated healthcare professionals (HCPs) — building confidence, clinical credibility, and compliance readiness before ever entering the field.

The core experience requires a lifelike, conversational AI avatar — one that can listen, respond in real time, and simulate a realistic face-to-face interaction. We are currently evaluating alternatives to our existing third-party avatar provider due to three critical limitations:

  • ·       Token costs at scale are prohibitive for enterprise deployment

  • ·       The API was designed for one-way video generation (marketing, training), not real-time conversational AI

  • ·       Reliance on a closed third-party platform creates unacceptable product risk — we have no control over pricing changes, API deprecations, or feature direction

We explored available open-source options and found none that meet the requirements of low-latency, conversational, avatar-based interaction. That gap is the opportunity this hackathon is designed to solve.

 

What We’re Looking For

We are looking for working software — not concepts or mockups — that demonstrates a real-time, conversational AI avatar capable of being integrated into a web-based application. The solution should be open-source or licensable by Proxa Labs for commercial use.

 

Core Requirements

 

Requirement

Description

Real-time Lip Sync

Avatar speech must be synchronized to audio in real time. Latency from audio input to visible lip movement should be under 300ms for a natural conversational feel.

Natural Language Input

The system must accept voice input from the user, transcribe it, and pass it to an LLM (e.g., Claude, GPT-4) to generate the avatar’s response.

LLM Integration

Must support integration with Anthropic Claude or OpenAI GPT-4 APIs to power the avatar’s dialogue and personality.

Expressive Facial Animation

Avatar should display natural head movement, eye movement, and basic emotional expression (neutral, engaged, skeptical, positive).

Web-Based Delivery

The solution must run in a modern browser environment (Chrome/Safari) without requiring local software installation.

Persona Configuration

Ability to configure the avatar’s name, role, personality traits, and dialogue behavior via a system prompt or config file.

Session State Management

The avatar should maintain conversational context across a multi-turn session (minimum 10–15 exchanges).

Mobile Browser Support

iOS, Safari, Android, Chrome

 

Bonus Features (Score Higher)

Solutions that include any of the following will receive additional scoring consideration:

·       Voice-to-voice with less than 500ms end-to-end latency

·       Custom avatar appearance via photo upload or 3D model

·       Compliance monitoring hook — ability to flag specific phrases or topics in real time alongside the conversation

·       Scoring and feedback module — post-session summary of conversation quality

·       Mobile browser support (iOS Safari / Android Chrome)

·       Multi-language support

 

Judging Criteria

Conversational Realism — latency, lip sync, expressiveness: 40%

Technical Completeness — all core requirements met and functional: 30%

Integration Quality — clean API / SDK that Proxa Labs can adopt: 20%

Code Quality — well-documented, maintainable, production-ready: 10%

 

Submission Requirements

All submissions must include:

•       Public GitHub repository with full source code and an open-source or commercial-compatible license (MIT, Apache 2.0, or equivalent)

•       A README with clear setup instructions, dependencies, and architecture overview

•       A working demo video (3–5 minutes) showing the avatar in a live conversational session — voice in, avatar responds in real time

•       A brief written document (1–2 pages) describing your technical approach, key design decisions, and any known limitations

•       Documentation of any third-party APIs or models used, including licensing terms

Rules and Eligibility

•       Open to individuals and teams worldwide — no geographic restrictions

•       Teams may be up to 4 members; each person may only participate in one team

•       Submissions must be original work created during the hackathon period

•       Use of open-source libraries and pre-trained models is permitted and encouraged

•       Proxa Labs retains the right to negotiate licensing of winning solutions; IP ownership remains with the submitting team unless otherwise agreed

•       Proxa Labs reserves the right to disqualify submissions that do not meet core requirements or contain plagiarized code

•       Prize payments will be made via wire transfer or Wise within 30 days of winner announcement

Technical Context for Builders

To help you build in the right direction, here is how the solution will integrate into Proxa Echo:

·       The avatar engine will be embedded in a React-based web application

·       Each session is initialized with a system prompt defining the HCP persona (name, specialty, personality, mood, product context)

·       The rep’s voice input is captured in the browser, transcribed, and sent to an LLM — the avatar speaks the LLM response

·       Sessions run 5–15 minutes with continuous multi-turn dialogue

·       The solution must support at least 3–5 distinct avatar appearances (different HCP personas)

·       Production infrastructure is AWS-based; solutions should be containerizable (Docker preferred)

Suggested technology stack directions (not prescriptive):

·       Talking head models: SadTalker, DiffTalk, Wav2Lip, MuseTalk

·       TTS: ElevenLabs API, Coqui TTS, Edge TTS, Kokoro

·       STT: Whisper (OpenAI), Deepgram, AssemblyAI

·       Real-time streaming: WebRTC, WebSockets

·       3D avatars: Ready Player Me + Three.js, Avaturn