From Training to Action: Making Our AI Work in the Real World

Building in Public: Part 3 - Teaching Our AI to Make Decisions 🤖

Dec 05, 2024

Hey there, fellow Alchemists ! 👋

Last time we made sure our AI kitchen was squeaky clean and ready to go with our verify.py script. Today? We're going to do something SUPER exciting - we're going to make our AI actually work in the real world!

Making Sure Everything's Ready: Your AI Setup Checklist

Devon Artis

November 9, 2024

Read full story

What Do We Mean by "Inference"? 🤔

Okay, let's chat about this word "inference" for a second. I remember the first time I saw it, I thought, "Wow, that sounds complicated!" But here's the thing - it's actually a pretty simple idea. You know how after you learn something, you use that knowledge to make decisions? Like how after you learn to ride a bike, you can look at any bike and think, "Yeah, I can ride that!"

That's exactly what inference is! Our AI has learned some patterns (that's the training part), and now it's going to use that knowledge to make decisions about new images it sees. Cool, right?

Opening Up inference.py 📝

Let's look at our script together:

"""
Inference script for hate content detection model
"""

import pandas as pd
from utils import ImageProcessor, SimpleMetalKMeans

See those imports? We're bringing in two really important tools:

pandas (that's what 'pd' stands for) is like our data organization superhero. Think of it as an super-powered Excel for Python
Our old friends ImageProcessor and SimpleMetalKMeans from utils.py that we talked about before

The Main Star: run_inference() 🌟

def run_inference(image_folder, model_path="model/kmeans_model.pkl"):
    """
    Run inference on a folder of images using a trained model.
    """
    print("Starting inference process...")

Let's break this down:

image_folder is where we keep the images we want to analyze
model_path points to our trained model (that .pkl file - think of it as our AI's learned experience)
The "pkl" extension? That stands for "pickle" (yes, really! 😄). It's Python's way of saving complex stuff to a file

Three Big Steps (Like a Recipe!) 🥘

Step 1: Getting Our Expert Ready 🧑‍🍳

try:
        # Load the trained model
        print("\nLoading trained model...")
        model = SimpleMetalKMeans.load(model_path)

This is like getting our expert chef into the kitchen. The try: part? That's like having a safety net - if something goes wrong, we'll catch it and handle it gracefully. We'll talk more about that in a minute!

Step 2: Preparing Our Images 🖼️

# Initialize image processor
        processor = ImageProcessor()

        # Load and process images
        print("\nProcessing images...")
        images, image_ids = processor.load_images(image_folder)

        # Preprocess data
        data = images.reshape(images.shape[0], -1) / 255.0

Whoa, what's that reshape and 255.0 business? Let me explain:

reshape is like reorganizing your photos in an album. Instead of having a complex 3D structure (width, height, colors), we flatten each image into a simple list of numbers
Dividing by 255.0 is like converting prices from cents to dollars - it scales our pixel values from 0-255 down to 0-1, which our AI prefers

Step 3: Decision Time! 🎯

# Make predictions
print("\nGenerating predictions...")
predictions = model.predict(data)

This is the magical moment where our AI looks at each image and makes a decision. But what exactly is it deciding? Remember, we're using this for content detection, so for each image, it's basically asking "Does this look concerning or normal?"

Making It Bulletproof 🛡️

Here's something I learned the hard way - ALWAYS plan for things to go wrong! Let's add some safety nets:

def run_inference(image_folder, model_path="model/kmeans_model.pkl"):
    try:
        # All our code from before...
        return results_df
    except FileNotFoundError as e:
        print(f"\nOops! Couldn't find a file: {e}")
        print("Check if your model and image folder paths are correct!")
        raise
    except Exception as e:
        print(f"\nSomething unexpected happened: {e}")
        raise

Why all this error handling? Well, let me tell you a story... I once ran an inference job on 10,000 images, and it crashed at image 9,999 with no error handling. 😭 Never again!

Saving Our Results 📊

# Create results DataFrame
        results_df = pd.DataFrame({
            "image_id": image_ids, 
            "prediction": predictions
        })
        
        # Save to CSV
        output_file = "inference_predictions.csv"
        results_df.to_csv(output_file, index=False)

A DataFrame is like a super-powered spreadsheet in Python. Here we're creating one with two columns:

image_id: to know which image we're talking about
prediction: what our AI thought about it (0 or 1)

We save it as a CSV file (like an Excel file) so we can easily look at it later.

Download Sample Code

Let's Try It Out! 🎮

Want to get your hands dirty? Here's a fun experiment:

Grab some random images (maybe some pet photos?)
Put them in a folder called "test_images"
Run this code:

if __name__ == "__main__":
    inference_folder = "./test_images"
    results = run_inference(inference_folder)
    print("\nResults Preview:")
    print(results.head())  # Show first few results

The Ethics Corner 🎯

Now here's something really important to think about - our model is making decisions that could affect content moderation. That's a big responsibility! We need to consider:

False positives: What happens if we flag normal content as concerning?
False negatives: What if we miss actually concerning content?
Bias: Are we treating all types of content fairly?

What Could Go Wrong? 🔧

Let's talk about some common hiccups you might hit:

"Model not found" Error

FileNotFoundError: [Errno 2] No such file or directory: 'model/kmeans_model.pkl'

This usually means you're not running the code from the right folder. Try printing your current directory:

import os
print("I'm looking for files in:", os.getcwd())

2. Memory Issues

If you're processing lots of images, you might run out of memory. Try processing in batches:

# Process 100 images at a time
for i in range(0, len(image_files), 100):
    batch_files = image_files[i:i+100]
    # Process batch...

What's Next? 🚀

Next time, we're going to get into something really interesting - adversarial testing! Think of it like trying to fool our AI with optical illusions. We'll see just how robust our model really is!