Projects

Reasoning System

Plan-Act-Verify Biomedical Reasoning

A CURE-Bench Plan-Act-Verify system that uses a model planner, biomedical tools, curated Tool Facts, and a final answer pass.

Run demoGitHub placeholder

Published acc.

0.69564

fine-tuned GPT-4.1 + tools

Passes

2

plan then answer/verify

Tool families

6

FDA, DailyMed, RxNav, more

overview

The backend wraps the original biomedical pipeline when OpenRouter and ToolUniverse are available. In local recorded mode, it displays an original submission CSV trace with the real plan, tool calls, Tool Facts, and final answer.

role

Portfolio integration: converted the original benchmark pipeline and submission artifacts into a readable question, plan, tool retrieval, verification, and answer display.

backend runner

Run original project workflow

Calls /api/biomedical/run on the FastAPI wrapper at http://localhost:8000.

question and choices

backend contract

Start the backend with uvicorn backend.main:app --reload --port 8000. The UI displays mode and provenance so recorded artifacts are clearly distinguished from live computation.

No backend result yet

Configure the inputs and run the backend to display the original project trace and outputs here.

architecture flow

Agent and model flow

The live pipeline trace appears in the backend runner after execution. This section shows the original project components that the backend wraps.

01

planner

GPT5Model.plan

plan JSON

Analyzes the stem and choices, extracts keywords, selects facts needed, and proposes biomedical tools.

02

retrieval

ToolAgent.collect

tool calls

Runs curated biomedical tools and records success/failure traces for evidence gathering.

03

filter

Tool Fact Curator

10 facts max

Filters, deduplicates, clips, and diversifies successful facts before the answer pass.

04

verifier

Pass-2 Answer Prompt

Final answer

Combines prior analysis, curated facts, and the full MCQ to produce one final answer letter.

tools and models

Components behind the demo

planner

GPT5Model.plan

plan JSON

Analyzes the stem and choices, extracts keywords, selects facts needed, and proposes biomedical tools.

retrieval

ToolAgent.collect

tool calls

Runs curated biomedical tools and records success/failure traces for evidence gathering.

filter

Tool Fact Curator

10 facts max

Filters, deduplicates, clips, and diversifies successful facts before the answer pass.

verifier

Pass-2 Answer Prompt

Final answer

Combines prior analysis, curated facts, and the full MCQ to produce one final answer letter.

example input

A pediatric generalized myasthenia gravis multiple-choice question with drug choices.

final result

The backend returns either a live original pipeline run or a provenance-backed recorded submission row from the original CURE-Bench outputs.

limitations

  • Live mode requires OpenRouter credentials and ToolUniverse dependencies.
  • Recorded mode is an original artifact but does not answer arbitrary new biomedical questions.
  • This is benchmark reasoning output, not medical advice.