Integration Guide

How to Build an Open-Source AI Agent for Instagram DMs with Hermes and InstantDM

Build a free, self-hosted Instagram DM agent using Hermes 3 (open-source LLM) with Ollama or Together AI. Complete guide with Node.js and Python code, no API fees, full data privacy.

Meta Business Partner
30,000+ creators
$9.99/mo flat

If you want full control over your AI agent with zero per-message API costs, open-source models are the way to go. Hermes 3 (built on Llama 3.1) is one of the best open-source models for conversational AI - it follows instructions well, supports tool use, and runs on your own hardware or through affordable inference providers.

This guide shows you how to build an Instagram DM agent using Hermes 3 with two deployment options: self-hosted via Ollama (completely free) or cloud-hosted via Together AI (pay-per-token but no hardware needed).


Why Open-Source for Instagram DMs?

Advantage Details
Zero API fees Run on your own GPU with Ollama - no per-message cost
Full data privacy Conversations never leave your server
No rate limits You control the throughput
Customizable Fine-tune on your own data for better brand voice
No vendor lock-in Switch models anytime without changing your code
Commercial use Hermes 3 / Llama 3.1 license allows commercial use

Trade-offs vs Claude/GPT

Factor Open-Source (Hermes 3) Claude/GPT
Response quality Very good (90% of GPT-4o for DMs) Best available
Setup complexity Higher (need GPU or inference provider) Simple API key
Running cost Free (self-hosted) or very cheap Pay per token
Speed Depends on hardware Consistently fast
Tool use Supported (Hermes 3) Native, polished

Architecture

Option A: Self-Hosted with Ollama (Free)

Instagram User
      |
      v
  InstantDM (webhook)
      |
      v
  Your Server (Node.js/Python)
      |
      v
  Ollama (running Hermes 3 locally)
      |
      v
  Your Server (sends reply)
      |
      v
  InstantDM API (delivers DM)

Option B: Together AI (Cloud, No GPU Needed)

Instagram User
      |
      v
  InstantDM (webhook)
      |
      v
  Your Server
      |
      v
  Together AI API (Hermes 3 hosted)
      |
      v
  Your Server (sends reply)
      |
      v
  InstantDM API (delivers DM)

What You Need

Requirement Option A (Ollama) Option B (Together AI)
InstantDM account Trendsetter+ Trendsetter+
InstantDM API key Settings > API Settings > API
GPU server 8GB+ VRAM for 8B model, 24GB+ for 70B Not needed
Together AI key Not needed From together.ai
Server Node.js or Python Node.js or Python

Option A: Self-Hosted with Ollama

Install Ollama and Pull Hermes 3

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull Hermes 3 (8B parameter version - needs ~5GB VRAM)
ollama pull hermes3:8b

# Or the larger 70B version for better quality (needs ~40GB VRAM)
ollama pull hermes3:70b

# Test it
ollama run hermes3:8b "Hello, how can I help you today?"

Ollama runs a local API server on http://localhost:11434 that's compatible with the OpenAI API format.

Node.js Agent with Ollama

const express = require('express');
const crypto = require('crypto');

const app = express();
app.use(express.json());

const INSTANTDM_API_KEY = process.env.INSTANTDM_API_KEY;
const OLLAMA_URL = 'http://localhost:11434/v1/chat/completions';
const conversations = new Map();

const SYSTEM_PROMPT = `You are the customer support assistant for [Your Brand].

Rules:
- Answer product questions accurately
- Keep responses under 280 characters
- Be friendly and conversational
- Never invent information
- No markdown formatting
- If unsure, offer to connect with a human

Product info:
[Your product details here]`;

function verifySignature(req) {
  const sig = req.headers['x-webhook-signature'];
  if (!sig) return false;
  const expected = crypto
    .createHmac('sha256', INSTANTDM_API_KEY)
    .update(JSON.stringify(req.body))
    .digest('hex');
  return crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(expected));
}

app.post('/webhook/instantdm', async (req, res) => {
  res.status(200).json({ received: true });
  if (!verifySignature(req)) return;
  const { event, data } = req.body;
  if (event === 'dm_received') await handleDM(data);
});

async function handleDM(data) {
  const { instagram_user_id, message_text } = data;

  if (!conversations.has(instagram_user_id)) {
    conversations.set(instagram_user_id, []);
  }
  const history = conversations.get(instagram_user_id);
  history.push({ role: 'user', content: message_text });
  if (history.length > 20) history.splice(0, history.length - 20);

  try {
    const response = await fetch(OLLAMA_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: 'hermes3:8b',
        messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...history],
        max_tokens: 250,
        temperature: 0.7,
        stream: false,
      }),
    });

    const result = await response.json();
    const reply = result.choices[0].message.content;
    history.push({ role: 'assistant', content: reply });
    await sendReply(instagram_user_id, reply);
  } catch (err) {
    console.error('Ollama error:', err);
    await sendReply(instagram_user_id,
      "Hey! Quick technical issue. A team member will follow up soon."
    );
  }
}

async function sendReply(recipientId, text) {
  if (text.length > 1000) text = text.substring(0, 997) + '...';
  text = text.replace(/[*_~`#]/g, '');

  await fetch('https://api.instantdm.com/api-webhook', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${INSTANTDM_API_KEY}`,
    },
    body: JSON.stringify({
      action: 'send_message',
      type: 'dm',
      recipient_id: recipientId,
      message: { type: 'text', text: text.trim() },
    }),
  });
}

app.listen(3000, () => console.log('Hermes agent running on port 3000'));

Python Agent with Ollama

import os, hmac, hashlib, threading, requests
from flask import Flask, request, jsonify

app = Flask(__name__)
INSTANTDM_API_KEY = os.environ['INSTANTDM_API_KEY']
OLLAMA_URL = 'http://localhost:11434/v1/chat/completions'
conversations = {}

SYSTEM_PROMPT = """You are the customer support assistant for [Your Brand].

Rules:
- Answer product questions accurately
- Keep responses under 280 characters
- Be friendly and conversational
- Never invent information
- No markdown formatting
- If unsure, offer to connect with a human

Product info:
[Your product details here]"""

def verify_signature(req):
    sig = req.headers.get('X-Webhook-Signature', '')
    expected = hmac.new(
        INSTANTDM_API_KEY.encode(), req.get_data(), hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(sig, expected)

@app.route('/webhook/instantdm', methods=['POST'])
def webhook():
    if not verify_signature(request):
        return jsonify({'error': 'Unauthorized'}), 401
    payload = request.get_json()
    if payload.get('event') == 'dm_received':
        threading.Thread(target=handle_dm, args=(payload['data'],)).start()
    return jsonify({'received': True}), 200

def handle_dm(data):
    user_id = data['instagram_user_id']
    message = data['message_text']

    if user_id not in conversations:
        conversations[user_id] = []
    history = conversations[user_id]
    history.append({'role': 'user', 'content': message})
    if len(history) > 20:
        conversations[user_id] = history[-20:]
        history = conversations[user_id]

    try:
        resp = requests.post(OLLAMA_URL, json={
            'model': 'hermes3:8b',
            'messages': [{'role': 'system', 'content': SYSTEM_PROMPT}] + history,
            'max_tokens': 250,
            'temperature': 0.7,
            'stream': False,
        })
        result = resp.json()
        reply = result['choices'][0]['message']['content']
        history.append({'role': 'assistant', 'content': reply})
        send_reply(user_id, reply)
    except Exception as e:
        print(f'Ollama error: {e}')
        send_reply(user_id, "Hey! Quick technical issue. A team member will follow up soon.")

def send_reply(recipient_id, text):
    if len(text) > 1000: text = text[:997] + '...'
    for ch in ['*','_','~','`','#']: text = text.replace(ch, '')
    requests.post('https://api.instantdm.com/api-webhook',
        headers={'Content-Type': 'application/json', 'Authorization': f'Bearer {INSTANTDM_API_KEY}'},
        json={'action': 'send_message', 'type': 'dm', 'recipient_id': recipient_id,
              'message': {'type': 'text', 'text': text.strip()}})

if __name__ == '__main__':
    app.run(port=3000)

Option B: Together AI (Cloud-Hosted, No GPU)

Together AI hosts open-source models with an OpenAI-compatible API. You get the benefits of open-source models without managing hardware.

Node.js with Together AI

const OpenAI = require('openai');

// Together AI uses OpenAI-compatible API
const together = new OpenAI({
  apiKey: process.env.TOGETHER_API_KEY,
  baseURL: 'https://api.together.xyz/v1',
});

async function handleDM(data) {
  const { instagram_user_id, message_text } = data;

  if (!conversations.has(instagram_user_id)) {
    conversations.set(instagram_user_id, []);
  }
  const history = conversations.get(instagram_user_id);
  history.push({ role: 'user', content: message_text });
  if (history.length > 20) history.splice(0, history.length - 20);

  try {
    const completion = await together.chat.completions.create({
      model: 'NousResearch/Hermes-3-Llama-3.1-8B',
      messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...history],
      max_tokens: 250,
      temperature: 0.7,
    });

    const reply = completion.choices[0].message.content;
    history.push({ role: 'assistant', content: reply });
    await sendReply(instagram_user_id, reply);
  } catch (err) {
    console.error('Together AI error:', err);
    await sendReply(instagram_user_id,
      "Hey! Quick technical issue. A team member will follow up soon."
    );
  }
}

Python with Together AI

from openai import OpenAI

# Together AI uses OpenAI-compatible API
together = OpenAI(
    api_key=os.environ['TOGETHER_API_KEY'],
    base_url='https://api.together.xyz/v1',
)

def handle_dm(data):
    user_id = data['instagram_user_id']
    message = data['message_text']

    if user_id not in conversations:
        conversations[user_id] = []
    history = conversations[user_id]
    history.append({'role': 'user', 'content': message})
    if len(history) > 20:
        conversations[user_id] = history[-20:]
        history = conversations[user_id]

    try:
        completion = together.chat.completions.create(
            model='NousResearch/Hermes-3-Llama-3.1-8B',
            messages=[{'role': 'system', 'content': SYSTEM_PROMPT}] + history,
            max_tokens=250,
            temperature=0.7,
        )
        reply = completion.choices[0].message.content
        history.append({'role': 'assistant', 'content': reply})
        send_reply(user_id, reply)
    except Exception as e:
        print(f'Together AI error: {e}')
        send_reply(user_id, "Hey! Quick technical issue. A team member will follow up soon.")

Other Open-Source Models You Can Use

The code above works with any model available on Ollama or Together AI. Just change the model name:

Model Parameters Quality Speed Ollama Name Together AI Name
Hermes 3 8B 8B Good for simple DMs Fast hermes3:8b NousResearch/Hermes-3-Llama-3.1-8B
Hermes 3 70B 70B Excellent, near GPT-4 Slower hermes3:70b NousResearch/Hermes-3-Llama-3.1-70B
Llama 3.1 8B 8B Good general purpose Fast llama3.1:8b meta-llama/Meta-Llama-3.1-8B-Instruct
Llama 3.1 70B 70B Excellent Medium llama3.1:70b meta-llama/Meta-Llama-3.1-70B-Instruct
Mistral 7B 7B Good, very fast Very fast mistral:7b mistralai/Mistral-7B-Instruct-v0.3
Qwen 2.5 7B 7B Strong multilingual Fast qwen2.5:7b Qwen/Qwen2.5-7B-Instruct

Recommendation

  • Low volume (< 100 DMs/day): Hermes 3 8B on Ollama (free, runs on a decent GPU)
  • Medium volume (100-1000 DMs/day): Hermes 3 8B on Together AI (~$0.20 per million tokens)
  • High quality needed: Hermes 3 70B on Together AI (~$0.90 per million tokens)
  • Multilingual DMs: Qwen 2.5 on Ollama or Together AI

Cost Comparison

Setup Monthly Cost (1,000 DMs/day)
Ollama on your GPU $0 (electricity only)
Together AI (Hermes 3 8B) ~$3/month
Together AI (Hermes 3 70B) ~$15/month
OpenAI GPT-4o-mini ~$5/month
Anthropic Claude Sonnet ~$20/month

Hermes 3 Tool Use (Function Calling)

Hermes 3 supports tool use natively. The format is similar to OpenAI's function calling:

async function handleDMWithTools(data) {
  const { instagram_user_id, message_text } = data;
  const history = await getConversation(instagram_user_id);
  history.push({ role: 'user', content: message_text });

  const tools = [
    {
      type: 'function',
      function: {
        name: 'search_products',
        description: 'Search the product catalog',
        parameters: {
          type: 'object',
          properties: { query: { type: 'string' } },
          required: ['query'],
        },
      },
    },
  ];

  // Together AI supports tool use with Hermes 3
  const completion = await together.chat.completions.create({
    model: 'NousResearch/Hermes-3-Llama-3.1-8B',
    messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...history],
    tools: tools,
    max_tokens: 300,
  });

  const message = completion.choices[0].message;

  if (message.tool_calls) {
    // Execute tool and get result
    const call = message.tool_calls[0];
    const result = await executeTool(call.function.name, JSON.parse(call.function.arguments));

    history.push(message);
    history.push({ role: 'tool', tool_call_id: call.id, content: JSON.stringify(result) });

    // Get final response with tool result
    const final = await together.chat.completions.create({
      model: 'NousResearch/Hermes-3-Llama-3.1-8B',
      messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...history],
      max_tokens: 250,
    });

    const reply = final.choices[0].message.content;
    await sendReply(instagram_user_id, reply);
  } else {
    await sendReply(instagram_user_id, message.content);
  }
}

GPU Requirements for Self-Hosting

Model VRAM Needed Example GPU
Hermes 3 8B (Q4) ~5 GB RTX 3060, RTX 4060
Hermes 3 8B (FP16) ~16 GB RTX 4080, A10
Hermes 3 70B (Q4) ~40 GB 2x RTX 4090, A100

For most Instagram DM use cases, the 8B model with Q4 quantization is more than sufficient and runs on consumer GPUs.


Troubleshooting

Issue Fix
Ollama not responding Check if Ollama is running: ollama list
Slow responses Use the 8B model instead of 70B, or switch to Together AI
Poor response quality Try Hermes 3 70B or switch to a larger model
Out of VRAM Use a smaller quantization (Q4 instead of FP16) or switch to Together AI
Model not following instructions Hermes 3 follows system prompts well - make sure your prompt is clear and specific

What's Next

  • Start with Ollama locally to test for free
  • Move to Together AI when you need reliability and scale
  • Add Redis for conversation persistence
  • Connect to your CRM with HubSpot or Google Sheets
  • Set up Slack notifications for human handoff

Ready to Automate Your Instagram DMs?

Join 30,000+ creators and brands using InstantDM today.

Start Your Free Trial

No credit card required. Setup in under 15 minutes.