API Service Template

FastAPI + ML Models template for production-ready AI services

Quick Start

# Clone template
git clone https://github.com/wearehybrid/api-template.git
cd api-template

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set environment variables
cp .env.example .env

# Start development server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Features

🚀 Production Ready

• Async request handling
• Automatic API documentation
• Request validation with Pydantic
• Error handling & logging
• Health checks & monitoring

🤖 AI Integration

• Model loading & caching
• Batch inference support
• Multiple model backends
• GPU acceleration
• Model versioning

API Architecture

📡 Client

HTTP Requests

⚡ FastAPI

Request Processing

🧠 Models

AI Inference

📊 Response

JSON Results

Code Structure

api-service/
├── app/
│   ├── api/
│   │   ├── v1/
│   │   │   ├── endpoints/
│   │   │   │   ├── predict.py
│   │   │   │   ├── health.py
│   │   │   │   └── models.py
│   │   │   └── api.py
│   │   └── deps.py
│   ├── core/
│   │   ├── config.py
│   │   ├── logging.py
│   │   └── security.py
│   ├── models/
│   │   ├── ml_models.py
│   │   └── schemas.py
│   └── services/
│       ├── prediction.py
│       └── model_manager.py
├── tests/
├── docker/
├── requirements.txt
└── main.py

Key Components

Main Application (main.py)

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.api.v1.api import api_router
from app.core.config import settings

app = FastAPI(
    title="AI API Service",
    description="Production-ready AI inference API",
    version="1.0.0",
    openapi_url="/api/v1/openapi.json"
)

# CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=settings.ALLOWED_HOSTS,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Include API routes
app.include_router(api_router, prefix="/api/v1")

@app.on_event("startup")
async def startup_event():
    # Load ML models on startup
    from app.services.model_manager import ModelManager
    await ModelManager.load_models()

Prediction Endpoint

from fastapi import APIRouter, Depends, HTTPException
from app.models.schemas import PredictionRequest, PredictionResponse
from app.services.prediction import PredictionService

router = APIRouter()

@router.post("/predict", response_model=PredictionResponse)
async def predict(
    request: PredictionRequest,
    service: PredictionService = Depends()
):
    try:
        result = await service.predict(
            data=request.data,
            model_name=request.model_name
        )
        return PredictionResponse(
            prediction=result.prediction,
            confidence=result.confidence,
            processing_time=result.processing_time
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Request/Response Schemas

Request Schema

from pydantic import BaseModel
from typing import List, Optional

class PredictionRequest(BaseModel):
    data: List[float]
    model_name: str = "default"
    batch_size: Optional[int] = 1
    
    class Config:
        schema_extra = {
            "example": {
                "data": [1.0, 2.0, 3.0],
                "model_name": "classifier",
                "batch_size": 1
            }
        }

Response Schema

class PredictionResponse(BaseModel):
    prediction: List[float]
    confidence: float
    processing_time: float
    model_version: str
    
    class Config:
        schema_extra = {
            "example": {
                "prediction": [0.8, 0.2],
                "confidence": 0.85,
                "processing_time": 0.023,
                "model_version": "v1.2.0"
            }
        }

Model Management

Model Manager Service

import torch
import asyncio
from typing import Dict, Any
from pathlib import Path

class ModelManager:
    _models: Dict[str, Any] = {}
    
    @classmethod
    async def load_models(cls):
        """Load all models on startup"""
        model_configs = {
            "classifier": "models/classifier.pt",
            "regressor": "models/regressor.pt"
        }
        
        for name, path in model_configs.items():
            if Path(path).exists():
                model = torch.load(path, map_location='cpu')
                model.eval()
                cls._models[name] = model
                print(f"Loaded model: {name}")
    
    @classmethod
    def get_model(cls, name: str):
        if name not in cls._models:
            raise ValueError(f"Model {name} not found")
        return cls._models[name]
    
    @classmethod
    def list_models(cls):
        return list(cls._models.keys())

API Endpoints

Core Endpoints

POST /api/v1/predict
GET /api/v1/models
GET /api/v1/health
GET /api/v1/metrics

Documentation

GET /docs - Swagger UI
GET /redoc - ReDoc
GET /openapi.json - OpenAPI spec

Deployment Options

Docker

Containerized deployment

docker build -t ai-api .

• Multi-stage builds
• GPU support
• Health checks

Kubernetes

Scalable orchestration

kubectl apply -f k8s/

• Auto-scaling
• Load balancing
• Rolling updates

Cloud Run

Serverless deployment

gcloud run deploy

• Pay per request
• Auto-scaling
• HTTPS included

Performance & Monitoring

📊

Metrics Collection

Prometheus metrics for request latency, throughput, and error rates

🚀

Caching Layer

Redis caching for frequently requested predictions

⚡

Async Processing

Background task queues for batch processing

Build Your AI API

Get started with our production-ready FastAPI template

Start Your Project View on GitHub