Integration Guide

How to Build an Open-Source AI Agent for Instagram DMs with Hermes and InstantDM

Build a free, self-hosted Instagram DM agent using Hermes 3 (open-source LLM) with Ollama or Together AI. Complete guide with Node.js and Python code, no API fees, full data privacy.

Start Free Trial →

✓ Meta Business Partner

✓ 30,000+ creators

✓ $9.99/mo flat

If you want full control over your AI agent with zero per-message API costs, open-source models are the way to go. Hermes 3 (built on Llama 3.1) is one of the best open-source models for conversational AI - it follows instructions well, supports tool use, and runs on your own hardware or through affordable inference providers.

This guide shows you how to build an Instagram DM agent using Hermes 3 with two deployment options: self-hosted via Ollama (completely free) or cloud-hosted via Together AI (pay-per-token but no hardware needed).

Why Open-Source for Instagram DMs?

Advantage	Details
Zero API fees	Run on your own GPU with Ollama - no per-message cost
Full data privacy	Conversations never leave your server
No rate limits	You control the throughput
Customizable	Fine-tune on your own data for better brand voice
No vendor lock-in	Switch models anytime without changing your code
Commercial use	Hermes 3 / Llama 3.1 license allows commercial use

Trade-offs vs Claude/GPT

Factor	Open-Source (Hermes 3)	Claude/GPT
Response quality	Very good (90% of GPT-4o for DMs)	Best available
Setup complexity	Higher (need GPU or inference provider)	Simple API key
Running cost	Free (self-hosted) or very cheap	Pay per token
Speed	Depends on hardware	Consistently fast
Tool use	Supported (Hermes 3)	Native, polished

Architecture

Option A: Self-Hosted with Ollama (Free)

Instagram User
      |
      v
  InstantDM (webhook)
      |
      v
  Your Server (Node.js/Python)
      |
      v
  Ollama (running Hermes 3 locally)
      |
      v
  Your Server (sends reply)
      |
      v
  InstantDM API (delivers DM)

Option B: Together AI (Cloud, No GPU Needed)

Instagram User
      |
      v
  InstantDM (webhook)
      |
      v
  Your Server
      |
      v
  Together AI API (Hermes 3 hosted)
      |
      v
  Your Server (sends reply)
      |
      v
  InstantDM API (delivers DM)

What You Need

Requirement	Option A (Ollama)	Option B (Together AI)
InstantDM account	Trendsetter+	Trendsetter+
InstantDM API key	Settings > API	Settings > API
GPU server	8GB+ VRAM for 8B model, 24GB+ for 70B	Not needed
Together AI key	Not needed	From together.ai
Server	Node.js or Python	Node.js or Python

Option A: Self-Hosted with Ollama

Install Ollama and Pull Hermes 3

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull Hermes 3 (8B parameter version - needs ~5GB VRAM)
ollama pull hermes3:8b

# Or the larger 70B version for better quality (needs ~40GB VRAM)
ollama pull hermes3:70b

# Test it
ollama run hermes3:8b "Hello, how can I help you today?"

Ollama runs a local API server on http://localhost:11434 that's compatible with the OpenAI API format.

Node.js Agent with Ollama

const express = require('express');
const crypto = require('crypto');

const app = express();
app.use(express.json());

const INSTANTDM_API_KEY = process.env.INSTANTDM_API_KEY;
const OLLAMA_URL = 'http://localhost:11434/v1/chat/completions';
const conversations = new Map();

const SYSTEM_PROMPT = `You are the customer support assistant for [Your Brand].

Rules:
- Answer product questions accurately
- Keep responses under 280 characters
- Be friendly and conversational
- Never invent information
- No markdown formatting
- If unsure, offer to connect with a human

Product info:
[Your product details here]`;

function verifySignature(req) {
  const sig = req.headers['x-webhook-signature'];
  if (!sig) return false;
  const expected = crypto
    .createHmac('sha256', INSTANTDM_API_KEY)
    .update(JSON.stringify(req.body))
    .digest('hex');
  return crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(expected));
}

app.post('/webhook/instantdm', async (req, res) => {
  res.status(200).json({ received: true });
  if (!verifySignature(req)) return;
  const { event, data } = req.body;
  if (event === 'dm_received') await handleDM(data);
});

async function handleDM(data) {
  const { instagram_user_id, message_text } = data;

  if (!conversations.has(instagram_user_id)) {
    conversations.set(instagram_user_id, []);
  }
  const history = conversations.get(instagram_user_id);
  history.push({ role: 'user', content: message_text });
  if (history.length > 20) history.splice(0, history.length - 20);

  try {
    const response = await fetch(OLLAMA_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: 'hermes3:8b',
        messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...history],
        max_tokens: 250,
        temperature: 0.7,
        stream: false,
      }),
    });

    const result = await response.json();
    const reply = result.choices[0].message.content;
    history.push({ role: 'assistant', content: reply });
    await sendReply(instagram_user_id, reply);
  } catch (err) {
    console.error('Ollama error:', err);
    await sendReply(instagram_user_id,
      "Hey! Quick technical issue. A team member will follow up soon."
    );
  }
}

async function sendReply(recipientId, text) {
  if (text.length > 1000) text = text.substring(0, 997) + '...';
  text = text.replace(/[*_~`#]/g, '');

  await fetch('https://api.instantdm.com/api-webhook', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${INSTANTDM_API_KEY}`,
    },
    body: JSON.stringify({
      action: 'send_message',
      type: 'dm',
      recipient_id: recipientId,
      message: { type: 'text', text: text.trim() },
    }),
  });
}

app.listen(3000, () => console.log('Hermes agent running on port 3000'));

Python Agent with Ollama

import os, hmac, hashlib, threading, requests
from flask import Flask, request, jsonify

app = Flask(__name__)
INSTANTDM_API_KEY = os.environ['INSTANTDM_API_KEY']
OLLAMA_URL = 'http://localhost:11434/v1/chat/completions'
conversations = {}

SYSTEM_PROMPT = """You are the customer support assistant for [Your Brand].

Rules:
- Answer product questions accurately
- Keep responses under 280 characters
- Be friendly and conversational
- Never invent information
- No markdown formatting
- If unsure, offer to connect with a human

Product info:
[Your product details here]"""

def verify_signature(req):
    sig = req.headers.get('X-Webhook-Signature', '')
    expected = hmac.new(
        INSTANTDM_API_KEY.encode(), req.get_data(), hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(sig, expected)

@app.route('/webhook/instantdm', methods=['POST'])
def webhook():
    if not verify_signature(request):
        return jsonify({'error': 'Unauthorized'}), 401
    payload = request.get_json()
    if payload.get('event') == 'dm_received':
        threading.Thread(target=handle_dm, args=(payload['data'],)).start()
    return jsonify({'received': True}), 200

def handle_dm(data):
    user_id = data['instagram_user_id']
    message = data['message_text']

    if user_id not in conversations:
        conversations[user_id] = []
    history = conversations[user_id]
    history.append({'role': 'user', 'content': message})
    if len(history) > 20:
        conversations[user_id] = history[-20:]
        history = conversations[user_id]

    try:
        resp = requests.post(OLLAMA_URL, json={
            'model': 'hermes3:8b',
            'messages': [{'role': 'system', 'content': SYSTEM_PROMPT}] + history,
            'max_tokens': 250,
            'temperature': 0.7,
            'stream': False,
        })
        result = resp.json()
        reply = result['choices'][0]['message']['content']
        history.append({'role': 'assistant', 'content': reply})
        send_reply(user_id, reply)
    except Exception as e:
        print(f'Ollama error: {e}')
        send_reply(user_id, "Hey! Quick technical issue. A team member will follow up soon.")

def send_reply(recipient_id, text):
    if len(text) > 1000: text = text[:997] + '...'
    for ch in ['*','_','~','`','#']: text = text.replace(ch, '')
    requests.post('https://api.instantdm.com/api-webhook',
        headers={'Content-Type': 'application/json', 'Authorization': f'Bearer {INSTANTDM_API_KEY}'},
        json={'action': 'send_message', 'type': 'dm', 'recipient_id': recipient_id,
              'message': {'type': 'text', 'text': text.strip()}})

if __name__ == '__main__':
    app.run(port=3000)

Option B: Together AI (Cloud-Hosted, No GPU)

Together AI hosts open-source models with an OpenAI-compatible API. You get the benefits of open-source models without managing hardware.

Node.js with Together AI

const OpenAI = require('openai');

// Together AI uses OpenAI-compatible API
const together = new OpenAI({
  apiKey: process.env.TOGETHER_API_KEY,
  baseURL: 'https://api.together.xyz/v1',
});

async function handleDM(data) {
  const { instagram_user_id, message_text } = data;

  if (!conversations.has(instagram_user_id)) {
    conversations.set(instagram_user_id, []);
  }
  const history = conversations.get(instagram_user_id);
  history.push({ role: 'user', content: message_text });
  if (history.length > 20) history.splice(0, history.length - 20);

  try {
    const completion = await together.chat.completions.create({
      model: 'NousResearch/Hermes-3-Llama-3.1-8B',
      messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...history],
      max_tokens: 250,
      temperature: 0.7,
    });

    const reply = completion.choices[0].message.content;
    history.push({ role: 'assistant', content: reply });
    await sendReply(instagram_user_id, reply);
  } catch (err) {
    console.error('Together AI error:', err);
    await sendReply(instagram_user_id,
      "Hey! Quick technical issue. A team member will follow up soon."
    );
  }
}

Python with Together AI

from openai import OpenAI

# Together AI uses OpenAI-compatible API
together = OpenAI(
    api_key=os.environ['TOGETHER_API_KEY'],
    base_url='https://api.together.xyz/v1',
)

def handle_dm(data):
    user_id = data['instagram_user_id']
    message = data['message_text']

    if user_id not in conversations:
        conversations[user_id] = []
    history = conversations[user_id]
    history.append({'role': 'user', 'content': message})
    if len(history) > 20:
        conversations[user_id] = history[-20:]
        history = conversations[user_id]

    try:
        completion = together.chat.completions.create(
            model='NousResearch/Hermes-3-Llama-3.1-8B',
            messages=[{'role': 'system', 'content': SYSTEM_PROMPT}] + history,
            max_tokens=250,
            temperature=0.7,
        )
        reply = completion.choices[0].message.content
        history.append({'role': 'assistant', 'content': reply})
        send_reply(user_id, reply)
    except Exception as e:
        print(f'Together AI error: {e}')
        send_reply(user_id, "Hey! Quick technical issue. A team member will follow up soon.")

Other Open-Source Models You Can Use

The code above works with any model available on Ollama or Together AI. Just change the model name:

Model	Parameters	Quality	Speed	Ollama Name	Together AI Name
Hermes 3 8B	8B	Good for simple DMs	Fast	hermes3:8b	NousResearch/Hermes-3-Llama-3.1-8B
Hermes 3 70B	70B	Excellent, near GPT-4	Slower	hermes3:70b	NousResearch/Hermes-3-Llama-3.1-70B
Llama 3.1 8B	8B	Good general purpose	Fast	llama3.1:8b	meta-llama/Meta-Llama-3.1-8B-Instruct
Llama 3.1 70B	70B	Excellent	Medium	llama3.1:70b	meta-llama/Meta-Llama-3.1-70B-Instruct
Mistral 7B	7B	Good, very fast	Very fast	mistral:7b	mistralai/Mistral-7B-Instruct-v0.3
Qwen 2.5 7B	7B	Strong multilingual	Fast	qwen2.5:7b	Qwen/Qwen2.5-7B-Instruct

Recommendation

Low volume (< 100 DMs/day): Hermes 3 8B on Ollama (free, runs on a decent GPU)
Medium volume (100-1000 DMs/day): Hermes 3 8B on Together AI (~$0.20 per million tokens)
High quality needed: Hermes 3 70B on Together AI (~$0.90 per million tokens)
Multilingual DMs: Qwen 2.5 on Ollama or Together AI

Cost Comparison

Setup	Monthly Cost (1,000 DMs/day)
Ollama on your GPU	$0 (electricity only)
Together AI (Hermes 3 8B)	~$3/month
Together AI (Hermes 3 70B)	~$15/month
OpenAI GPT-4o-mini	~$5/month
Anthropic Claude Sonnet	~$20/month

Hermes 3 Tool Use (Function Calling)

Hermes 3 supports tool use natively. The format is similar to OpenAI's function calling:

async function handleDMWithTools(data) {
  const { instagram_user_id, message_text } = data;
  const history = await getConversation(instagram_user_id);
  history.push({ role: 'user', content: message_text });

  const tools = [
    {
      type: 'function',
      function: {
        name: 'search_products',
        description: 'Search the product catalog',
        parameters: {
          type: 'object',
          properties: { query: { type: 'string' } },
          required: ['query'],
        },
      },
    },
  ];

  // Together AI supports tool use with Hermes 3
  const completion = await together.chat.completions.create({
    model: 'NousResearch/Hermes-3-Llama-3.1-8B',
    messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...history],
    tools: tools,
    max_tokens: 300,
  });

  const message = completion.choices[0].message;

  if (message.tool_calls) {
    // Execute tool and get result
    const call = message.tool_calls[0];
    const result = await executeTool(call.function.name, JSON.parse(call.function.arguments));

    history.push(message);
    history.push({ role: 'tool', tool_call_id: call.id, content: JSON.stringify(result) });

    // Get final response with tool result
    const final = await together.chat.completions.create({
      model: 'NousResearch/Hermes-3-Llama-3.1-8B',
      messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...history],
      max_tokens: 250,
    });

    const reply = final.choices[0].message.content;
    await sendReply(instagram_user_id, reply);
  } else {
    await sendReply(instagram_user_id, message.content);
  }
}

GPU Requirements for Self-Hosting

Model	VRAM Needed	Example GPU
Hermes 3 8B (Q4)	~5 GB	RTX 3060, RTX 4060
Hermes 3 8B (FP16)	~16 GB	RTX 4080, A10
Hermes 3 70B (Q4)	~40 GB	2x RTX 4090, A100

For most Instagram DM use cases, the 8B model with Q4 quantization is more than sufficient and runs on consumer GPUs.

Troubleshooting

Issue	Fix
Ollama not responding	Check if Ollama is running: `ollama list`
Slow responses	Use the 8B model instead of 70B, or switch to Together AI
Poor response quality	Try Hermes 3 70B or switch to a larger model
Out of VRAM	Use a smaller quantization (Q4 instead of FP16) or switch to Together AI
Model not following instructions	Hermes 3 follows system prompts well - make sure your prompt is clear and specific

What's Next

Start with Ollama locally to test for free
Move to Together AI when you need reliability and scale
Add Redis for conversation persistence
Connect to your CRM with HubSpot or Google Sheets
Set up Slack notifications for human handoff

Ready to Automate Your Instagram DMs?

Join 30,000+ creators and brands using InstantDM today.

Start Your Free Trial

No credit card required. Setup in under 15 minutes.