Build Intelligent Content Moderation Systems with AI

Discover how to implement automated content moderation for user-generated images using ViscribeAI's image classification and visual question answering capabilities.

Why Automated Image Moderation Matters

Platforms with user-generated content face a critical challenge: moderating millions of images while maintaining user safety and community guidelines. Manual moderation is expensive, doesn't scale, and exposes moderators to harmful content. AI-powered moderation offers a solution that's fast, consistent, and can pre-screen content before human review.

ViscribeAI for Content Moderation

ViscribeAI provides powerful endpoints for building intelligent moderation systems:

Classify Image: Categorize images by content type (safe, sensitive, prohibited)
Visual Question Answering: Ask specific questions about image content
Describe Image: Generate detailed descriptions to understand context

Implementation Guide

1. Setup and Installation

python

pip install viscribe

from viscribe import Client

# Initialize the client
client = Client(api_key="your-api-key-here")

2. Basic Content Classification

Start with simple classification to flag potentially problematic content:

python

def moderate_image(image_url):
    """
    Classify image into safety categories
    """
    result = client.classify_image(
        image_url=image_url,
        classes=[
            "safe - appropriate for all audiences",
            "sensitive - may require age restriction",
            "prohibited - violates community guidelines",
            "spam - promotional or advertising content",
            "other - needs manual review"
        ]
    )

    return {
        "category": result.classification,
        "confidence": result.confidence,
        "action": get_moderation_action(result.classification, result.confidence)
    }

def get_moderation_action(category, confidence):
    """
    Determine action based on classification
    """
    if "prohibited" in category.lower() and confidence > 0.85:
        return "block"
    elif "sensitive" in category.lower() and confidence > 0.75:
        return "flag_for_review"
    elif "spam" in category.lower() and confidence > 0.80:
        return "flag_as_spam"
    elif confidence < 0.70:
        return "manual_review"
    else:
        return "approve"

# Example usage
result = moderate_image("https://example.com/user-upload.jpg")
print(f"Category: {result['category']}")
print(f"Confidence: {result['confidence']:.2f}")
print(f"Action: {result['action']}")

3. Multi-Stage Moderation Pipeline

Combine multiple endpoints for more nuanced moderation:

python

def advanced_moderation(image_url):
    """
    Multi-stage moderation with detailed analysis
    """
    # Stage 1: Initial classification
    classification = client.classify_image(
        image_url=image_url,
        classes=[
            "safe content",
            "adult content",
            "violent content",
            "spam or advertising",
            "medical content",
            "general content"
        ]
    )

    # Stage 2: Get detailed description for context
    description = client.describe_image(
        image_url=image_url,
        generate_tags=True
    )

    # Stage 3: Ask specific moderation questions
    questions = [
        "Does this image contain any people?",
        "Is there any text visible in the image?",
        "Does this appear to be professional or user-generated content?"
    ]

    qa_results = {}
    for question in questions:
        answer = client.ask_image(
            image_url=image_url,
            question=question
        )
        qa_results[question] = answer.answer

    # Combine results for final decision
    return {
        "classification": classification.classification,
        "confidence": classification.confidence,
        "description": description.description,
        "tags": description.tags,
        "qa_insights": qa_results,
        "requires_review": classification.confidence < 0.80 or \
                          "adult" in classification.classification.lower() or \
                          "violent" in classification.classification.lower()
    }

# Example usage
result = advanced_moderation("https://example.com/upload.jpg")

if result["requires_review"]:
    print("⚠️  Flagged for manual review")
else:
    print("✅ Auto-approved")

print(f"\nClassification: {result['classification']}")
print(f"Confidence: {result['confidence']:.2f}")
print(f"Description: {result['description']}")
print(f"\nQ&A Insights:")
for question, answer in result['qa_insights'].items():
    print(f"  Q: {question}")
    print(f"  A: {answer}")

4. Marketplace Product Listing Moderation

Moderate product images on marketplace platforms to ensure compliance:

python

def moderate_product_listing(image_url):
    """
    Specialized moderation for marketplace product images
    """
    # Classify product category
    category = client.classify_image(
        image_url=image_url,
        classes=[
            "legitimate product photo",
            "prohibited item",
            "counterfeit product",
            "inappropriate image",
            "stock photo or screenshot"
        ]
    )

    # Ask compliance questions
    compliance_checks = [
        "Does this appear to be an authentic product photo?",
        "Are there any visible brand logos or trademarks?",
        "Is this a professional product photo or a stock image?"
    ]

    checks = {}
    for question in compliance_checks:
        answer = client.ask_image(image_url=image_url, question=question)
        checks[question] = answer.answer

    # Determine if listing can be approved
    auto_approve = (
        "legitimate" in category.classification.lower() and
        category.confidence > 0.85
    )

    return {
        "category": category.classification,
        "auto_approve": auto_approve,
        "compliance_checks": checks,
        "recommendation": "approve" if auto_approve else "review"
    }

# Example
result = moderate_product_listing("https://example.com/product.jpg")
print(f"Status: {result['recommendation'].upper()}")
print(f"Category: {result['category']}")

5. Social Media Content Screening

Screen user posts on social platforms before they go live:

python

def screen_social_post(image_url, user_reputation_score=0.5):
    """
    Screen social media images with reputation-based thresholds
    """
    # Initial classification
    result = client.classify_image(
        image_url=image_url,
        classes=[
            "safe - appropriate content",
            "questionable - potentially sensitive",
            "unsafe - violates guidelines",
            "spam - promotional content"
        ]
    )

    # Adjust threshold based on user reputation
    # Trusted users (high reputation) get more lenient thresholds
    confidence_threshold = 0.70 if user_reputation_score > 0.8 else 0.85

    # Determine action
    if "unsafe" in result.classification.lower():
        if result.confidence > confidence_threshold:
            action = "block"
            message = "Content violates community guidelines"
        else:
            action = "review"
            message = "Content flagged for review"
    elif "questionable" in result.classification.lower():
        if user_reputation_score > 0.8:
            action = "approve_with_warning"
            message = "Content approved with sensitivity warning"
        else:
            action = "review"
            message = "Content needs manual review"
    elif "spam" in result.classification.lower():
        action = "flag_spam"
        message = "Potential spam detected"
    else:
        action = "approve"
        message = "Content approved"

    return {
        "action": action,
        "message": message,
        "classification": result.classification,
        "confidence": result.confidence,
        "user_reputation_factor": user_reputation_score
    }

# Example with high-reputation user
result_trusted = screen_social_post(
    "https://example.com/post.jpg",
    user_reputation_score=0.9
)
print(f"Trusted User: {result_trusted['action']} - {result_trusted['message']}")

# Example with new user
result_new = screen_social_post(
    "https://example.com/post.jpg",
    user_reputation_score=0.3
)
print(f"New User: {result_new['action']} - {result_new['message']}")

Real-World Use Cases

1. Dating Apps

Automatically screen profile photos to ensure they meet community standards and don't contain inappropriate content, spam, or non-person images.

2. Marketplace Platforms

Validate that product listings contain genuine product photos rather than stock images, screenshots, or prohibited items.

3. Social Networks

Pre-screen user posts before publication to catch policy violations while allowing safe content to post immediately.

4. Educational Platforms

Ensure uploaded images in student work or forum posts are appropriate for educational environments.

5. Job Boards

Verify company logos and screening job posting images to prevent impersonation and spam.

Building a Complete Moderation System

Here's a production-ready moderation class with logging and metrics:

python

from viscribe import Client
from datetime import datetime
import logging

class ContentModerator:
    def __init__(self, api_key, confidence_threshold=0.75):
        self.client = Client(api_key=api_key)
        self.threshold = confidence_threshold
        self.logger = logging.getLogger(__name__)

        # Track metrics
        self.metrics = {
            "total_moderated": 0,
            "auto_approved": 0,
            "auto_blocked": 0,
            "flagged_for_review": 0
        }

    def moderate(self, image_url, context=None):
        """
        Moderate an image with full logging and metrics
        """
        start_time = datetime.now()

        try:
            # Perform moderation
            result = self.client.classify_image(
                image_url=image_url,
                classes=[
                    "safe content",
                    "sensitive content",
                    "prohibited content",
                    "spam"
                ]
            )

            # Determine action
            action = self._determine_action(result)

            # Update metrics
            self.metrics["total_moderated"] += 1
            self.metrics[action] = self.metrics.get(action, 0) + 1

            # Log decision
            self.logger.info(
                f"Moderation: {action} | "
                f"Category: {result.classification} | "
                f"Confidence: {result.confidence:.2f} | "
                f"Time: {(datetime.now() - start_time).total_seconds():.2f}s"
            )

            return {
                "action": action,
                "category": result.classification,
                "confidence": result.confidence,
                "timestamp": datetime.now().isoformat(),
                "context": context
            }

        except Exception as e:
            self.logger.error(f"Moderation error: {str(e)}")
            # Fail safe: flag for manual review on error
            return {
                "action": "flagged_for_review",
                "error": str(e),
                "timestamp": datetime.now().isoformat()
            }

    def _determine_action(self, result):
        """Determine moderation action based on classification"""
        category = result.classification.lower()
        confidence = result.confidence

        if "prohibited" in category and confidence > self.threshold:
            return "auto_blocked"
        elif "sensitive" in category or confidence < self.threshold:
            return "flagged_for_review"
        elif "spam" in category and confidence > 0.80:
            return "auto_blocked"
        else:
            return "auto_approved"

    def get_metrics(self):
        """Return moderation metrics"""
        return self.metrics

# Usage
moderator = ContentModerator(api_key="your-key", confidence_threshold=0.80)

# Moderate images
images_to_check = [
    "https://example.com/user1.jpg",
    "https://example.com/user2.jpg",
    "https://example.com/user3.jpg"
]

for image_url in images_to_check:
    result = moderator.moderate(image_url, context={"source": "user_upload"})
    print(f"Image: {image_url} -> Action: {result['action']}")

# Check metrics
print(f"\nModeration Metrics:")
print(moderator.get_metrics())

Best Practices

Multi-Stage Approach: Combine classification with Q&A for higher accuracy
Confidence Thresholds: Adjust based on risk tolerance and user reputation
Human in the Loop: Always have manual review for edge cases
Audit Logging: Track all decisions for compliance and improvement
Feedback Loop: Use ViscribeAI's feedback endpoint to improve accuracy over time
Context Awareness: Consider user history, platform area, and content type

Handling Edge Cases

Some images are harder to classify automatically. Here's how to handle them:

python

def handle_edge_cases(image_url):
    """
    Special handling for ambiguous content
    """
    # Get initial classification
    result = client.classify_image(
        image_url=image_url,
        classes=["safe", "questionable", "unsafe"]
    )

    # Low confidence = edge case
    if result.confidence < 0.70:
        # Get more context with description and Q&A
        description = client.describe_image(image_url=image_url)

        # Ask clarifying questions
        context_questions = [
            "What is the main subject of this image?",
            "Is there any text or branding visible?",
            "What is the setting or location?"
        ]

        additional_context = {}
        for q in context_questions:
            answer = client.ask_image(image_url=image_url, question=q)
            additional_context[q] = answer.answer

        return {
            "requires_manual_review": True,
            "initial_classification": result.classification,
            "confidence": result.confidence,
            "description": description.description,
            "context": additional_context,
            "priority": "high" if "questionable" in result.classification else "normal"
        }
    else:
        return {
            "requires_manual_review": False,
            "classification": result.classification,
            "confidence": result.confidence
        }

Performance and Scalability

For high-volume moderation, use async processing:

python

import asyncio
from viscribe import AsyncClient

async def moderate_batch(image_urls):
    """
    Moderate multiple images concurrently
    """
    client = AsyncClient()

    async def moderate_single(url):
        result = await client.classify_image(
            image_url=url,
            classes=["safe", "sensitive", "prohibited"]
        )
        return {"url": url, "result": result}

    # Process all images concurrently
    tasks = [moderate_single(url) for url in image_urls]
    results = await asyncio.gather(*tasks)

    return results

# Process 1000 images in parallel
image_urls = ["https://example.com/img1.jpg", ...]  # 1000 URLs
results = asyncio.run(moderate_batch(image_urls))

Success Metrics

Platforms using ViscribeAI for content moderation report:

70-85% reduction in manual moderation workload
Sub-second response times for real-time moderation
95%+ accuracy for clear-cut cases
Improved user experience with faster content approval

Get Started

Ready to implement intelligent content moderation? Sign up at dashboard.viscribe.ai to get your API key and free credits. Visit our documentation for more examples and integration guides.