Skillcraft: AI-enhanced assessment authoring

The problem: A founder bottleneck

Skillcraft had validated product-market fit for their psychometric skill assessment platform with their first B2B customer, but quickly hit a critical scaling ceiling. The CEO—who brought a decade of neuroscience and human-performance research from Oxford, Stanford, and Harvard—was personally hand-crafting every assessment. This consultative approach produced excellent results, but created an obvious bottleneck: the company could only serve as many customers as the CEO had hours in the day.

Each assessment took approximately two weeks to create, involving complex cognitive work: defining behavioral signals for skills, creating realistic business scenarios, writing rubrics that discriminate top performers from lower performers, and configuring AI-powered roleplay conversations. Without a way to accelerate this process, the company couldn't scale past 1-2 customers.

My role

Hired to lead design at the then two-person startup, I owned the complete design process from research through implementation. I was responsible for strategic product thinking (what problems to solve and in what order), user research and workflow design (understanding how assessment authors think), AI workflow integration (where and how to embed AI assistance), information architecture (the behavioral signal system, two-layer scenario structure), and UI design and prototyping using AI-accelerated tools like Lovable.

I worked directly with the CEO and CTO, translating the CEO's assessment science expertise into scalable product workflows. This required understanding psychometric validity deeply enough to make design decisions that protected measurement integrity—for example, deciding what content could be AI-generated vs. what required human review.

The process

This wasn't a polish-the-UI project. It was a 0 → 1 design challenge that required understanding both the science of assessment design and the pragmatics of early-stage startup constraints. Here's how it unfolded:

Understanding the expert workflow

I shadowed the CEO as he created assessments manually, documenting his decision-making process. Key insight: he followed a consistent cognitive sequence—define competencies → break into behavioral signals → design scenarios to elicit those signals → create rubrics for evaluation. This became the structural foundation of the authoring tool.

Identifying "blank screen moments"

Rather than adding generic "AI magic wand" buttons, I mapped specific points in the workflow where authors would stall: generating behavioral signals from skill descriptions, creating realistic business scenarios, authoring background information for AI roleplay agents. These became the integration points for AI assistance—not replacing expertise, but accelerating it.

Rapid prototyping with AI-accelerated design

Using tools like Lovable, I compressed iteration cycles from weeks to days. This speed was essential at a startup racing against cashflow constraints. The philosophy: AI accelerates execution of design thinking; it doesn't replace it. Strategic decisions about workflow, architecture, and measurement validity still required human judgment.

Key design features

In addition to designing a standardized, reusable overall framework for structuring and presenting assessments, the primary transformative features used AI enhancements to 10x author efficiency. These features include:

AI-assisted behavioral signal authoring. The first step in creating a psychometrically valid skill assessment is defining behavioral signals, which are observable specific actions that demonstrate competency in a specific skill. By simply describing a skill in general terms, the AI assistant I created will generate 5-10 valid behavioral signals as a solid starting point for hands-on refinement—not as a replacement for expertise.

Edit Skill modal showing structured Notes and Behavioral Signals tabs — The Edit Skill modal, showing AI-generated behavioral signals ready for expert refinement

Automatic narrative generation. One of the most labor-intensive aspects of creating an interactive skill assessment is the creation of fictional companies, characters, and situations which provide the foundation for scenes and interactions used within the assessment. By simply selecting the skills you want to detect, the platform's AI assistant can create an entire fictional backstory designed to contain situations which provide opportunities for the participant to demonstrate the targeted skills.

Create Scenario with AI modal alongside the Edit Scenario editor — The Create Scenario flow uses AI to generate full fictional narratives tailored to targeted skills

AI-generated visual assets. Once a fictional scenario narrative is created, individual "scenes" within visual design assets need to be created. In the assessment player, "background cards" are presented which include company names, details, and other information including a logo. Again, an AI assistant was created to generate these cards, based on the already generated fictional narrative.

Leaving the platform to generate logos created a frustrating break in authoring "flow", so I created an embedded logo generator which would seamlessly add logo generation to the card creation process.

An embedded logo generator keeps authors in flow while creating background cards

AI-generated roleplay agent backgrounds. One of the most advanced features of the Skillcraft assessment platform is its AI roleplay feature, where users can have an actual conversation with a fictional character from the scenario narrative. In order for this experience to be effective, the AI avatar character must have a full backstory in terms of their current situation, their motivations, their concerns, etc. Here again we can use AI to draw upon the already established fictional narrative to create the character's backstory, fully optimized to contain details and clues designed to elicit the exhibition of targeted skills.

Add Challenge modal with AI Conversation Settings alongside the candidate-facing Sales Roleplay screen — AI roleplay agents are configured with rich, scenario-grounded backstories

The impact

The transformation was dramatic—both quantitatively and qualitatively:

Assessment creation: 2 weeks → less than 1 day. A 10x efficiency improvement that directly removed the founder bottleneck. Assessments could now be primarily created by other team members while the CEO focused on vital customer relationships and business development.

Designing the platform increased product focus. The process of creating the assessment editing platform had the side effect of forcing strategic thought and conversations about how assessments should be structured and why. Being able to rapidly create and test different assessment configurations led to core decisions being made about how future assessments would be structured.

The rapid development of assessments led directly to increased business funding. One of our primary customers needed demonstrable evidence of what the future held in order to give us a significant advance in funding. By rapidly building multiple impressive demonstration scenarios, we were able to secure this funding from the customer without hesitation.

Lessons learned

Domain expertise is a design material

The CEO's assessment science knowledge wasn't just context for the design—it was the primary material I was working with. My job wasn't to make generic authoring tools; it was to encode a decade of research expertise into workflows that non-experts could follow. Understanding psychometric validity well enough to design for it elevated this from UI polish to strategic product work.

AI acceleration requires strategic placement

The difference between useful AI assistance and noise is knowing where to integrate it. Mapping the expert workflow first, then identifying "blank screen moments" where authors would stall, let me place AI features where they'd actually help rather than adding generic "magic wand" buttons everywhere. This thoughtful integration is why the platform achieved 10x efficiency gains instead of marginal improvements.

Architecture decisions compound over time

The behavioral signal architecture—structuring skills as decomposable, mappable entities rather than free-text labels—wasn't just organizational tidiness. It enabled AI assistance (structured data is easier to work with), protected measurement validity (explicit traceability), and created foundations for future features like analytics and reporting. Early structural decisions have long-term leverage.

AI-enhanced assessment authoring at Skillcraft