图片与PDF
OpenRouter支持通过API发送图片和PDF文件。本文将向您展示如何使用我们的API处理这两种文件类型。
图片和PDF文件同样适用于聊天中进行交互。
图片输入
对于多模态模型,带有图片的请求可以通过/v1/chat/completions
API实现,需要使用messages
参数的多部分表单格式。image_url
可以是一个URL,也可以是base64编码的图片。注意,可以通过在内容数组中添加多个条目来发送多张图片。单次请求中可以发送的图片数量因供应商和模型而异。由于内容解析的方式,我们建议先发送文本提示,再发送图片。如果必须先发送图片,建议将其放入系统提示中。
使用图片URL
以下是使用URL发送图片的方法:
- Python
- TypeScript
import requests
import json
url = "https://openrouter.co/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
]
payload = {
"model": "google/gemini-2.5-flash",
"messages": messages
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
const response = await fetch('https://openrouter.co/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: `Bearer <OPENROUTER_API_KEY>`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'google/gemini-2.5-flash',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: "What's in this image?",
},
{
type: 'image_url',
image_url: {
url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
},
},
],
},
],
}),
});
const data = await response.json();
console.log(data);
使用Base64编码的图片
对于本地存储的图片,您可以采用Base64编码方式进行传输。具体操作如下:
- Python
- TypeScript
import requests
import json
import base64
from pathlib import Path
def encode_image_to_base64(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
url = "https://openrouter.co/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}
# Read and encode the image
image_path = "path/to/your/image.jpg"
base64_image = encode_image_to_base64(image_path)
data_url = f"data:image/jpeg;base64,{base64_image}"
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": data_url
}
}
]
}
]
payload = {
"model": "google/gemini-2.5-flash",
"messages": messages
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
async function encodeImageToBase64(imagePath: string): Promise<string> {
const imageBuffer = await fs.promises.readFile(imagePath);
const base64Image = imageBuffer.toString('base64');
return `data:image/jpeg;base64,${base64Image}`;
}
// Read and encode the image
const imagePath = 'path/to/your/image.jpg';
const base64Image = await encodeImageToBase64(imagePath);
const response = await fetch('https://openrouter.co/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY_REF}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: "What's in this image?",
},
{
type: 'image_url',
image_url: {
url: base64Image,
},
},
],
},
],
}),
});
const data = await response.json();
console.log(data);
支持的图片内容类型有:
image/png
image/jpeg
image/webp
PDF 支持
OpenRouter 通过 /v1/chat/completions
API 提供 PDF 处理功能。PDF 文件可以作为 base64 编码的数据 URL 通过文件内容类型发送到消息数组。此功能适用于 OpenRouter 上的任何模型。
如果模型原生支持文件(file)输入,则 PDF 会直接传递给模型;如果模型不支持原生文件输入,OpenRouter 将解析文件并将解析结果传递给请求的模型。
注意,多个 PDF 可以通过独立的内容数组条目发送。单次请求中可发送的 PDF 数量因服务提供商和模型而异。由于内容解析方式的不同,我们建议先发送文本提示,再发送 PDF。如果 PDF 必须排在前面,建议将其放在系统提示词中。
插件配置
如需配置 PDF 处理功能,可在请求中使用 plugins
参数。OpenRouter 提供多种不同功能和价格的 PDF 处理引擎:
{
plugins: [
{
id: 'file-parser',
pdf: {
engine: 'pdf-text', // or 'mistral-ocr' or 'native'
},
},
],
}
定价
OpenRouter 提供多种 PDF 处理引擎:
"mistral-ocr"
:适用于扫描文档或包含图片的 PDF(每 1000 页 2 美元)。"pdf-text"
:适用于结构良好、文本内容清晰的 PDF(免费)。"native"
:仅适用于原生支持文件输入的模型(按输入 token 计费)。
如果没有明确指定引擎,OpenRouter 会优先使用模型的原生文件处理能力。如果不可用,则默认使用 "mistral-ocr"
引擎。
处理 PDF
以下是发送和处理 PDF 的方法:
- Python
- TypeScript
import requests
import json
import base64
from pathlib import Path
def encode_pdf_to_base64(pdf_path):
with open(pdf_path, "rb") as pdf_file:
return base64.b64encode(pdf_file.read()).decode('utf-8')
url = "https://openrouter.co/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}
# Read and encode the PDF
pdf_path = "path/to/your/document.pdf"
base64_pdf = encode_pdf_to_base64(pdf_path)
data_url = f"data:application/pdf;base64,{base64_pdf}"
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
},
]
}
]
# Optional: Configure PDF processing engine
# PDF parsing will still work even if the plugin is not explicitly set
plugins = [
{
"id": "file-parser",
"pdf": {
"engine": "pdf-text" # defaults to "mistral-ocr". See Pricing above
}
}
]
payload = {
"model": "google/gemma-3-27b-it",
"messages": messages,
"plugins": plugins
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
async function encodePDFToBase64(pdfPath: string): Promise<string> {
const pdfBuffer = await fs.promises.readFile(pdfPath);
const base64PDF = pdfBuffer.toString('base64');
return `data:application/pdf;base64,${base64PDF}`;
}
// Read and encode the PDF
const pdfPath = 'path/to/your/document.pdf';
const base64PDF = await encodePDFToBase64(pdfPath);
const response = await fetch('https://openrouter.co/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY_REF}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What are the main points in this document?',
},
{
type: 'file',
file: {
filename: 'document.pdf',
file_data: base64PDF,
},
},
],
},
],
// Optional: Configure PDF processing engine
// PDF parsing will still work even if the plugin is not explicitly set
plugins: [
{
id: 'file-parser',
pdf: {
engine: '{{ENGINE}}', // defaults to "{{DEFAULT_PDF_ENGINE}}". See Pricing above
},
},
],
}),
});
const data = await response.json();
console.log(data);
跳过解析成本
当你向 API 发送 PDF 文件时,响应中可能会包含助手的消息里的文件注解。这些注解记录了解析过的 PDF 文档的结构化信息。如果在后续请求中重新发送这些注解,你可以避免多次重复解析同一个 PDF 文档,从而节省处理时间和成本。
以下是复用文件注解的方法:
- Python
- TypeScript
import requests
import json
import base64
from pathlib import Path
# First, encode and send the PDF
def encode_pdf_to_base64(pdf_path):
with open(pdf_path, "rb") as pdf_file:
return base64.b64encode(pdf_file.read()).decode('utf-8')
url = "https://openrouter.co/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY_REF}",
"Content-Type": "application/json"
}
# Read and encode the PDF
pdf_path = "path/to/your/document.pdf"
base64_pdf = encode_pdf_to_base64(pdf_path)
data_url = f"data:application/pdf;base64,{base64_pdf}"
# Initial request with the PDF
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
},
]
}
]
payload = {
"model": "google/gemma-3-27b-it",
"messages": messages
}
response = requests.post(url, headers=headers, json=payload)
response_data = response.json()
# Store the annotations from the response
file_annotations = None
if response_data.get("choices") and len(response_data["choices"]) > 0:
if "annotations" in response_data["choices"][0]["message"]:
file_annotations = response_data["choices"][0]["message"]["annotations"]
# Follow-up request using the annotations (without sending the PDF again)
if file_annotations:
follow_up_messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the main points in this document?"
},
{
"type": "file",
"file": {
"filename": "document.pdf",
"file_data": data_url
}
}
]
},
{
"role": "assistant",
"content": "The document contains information about...",
"annotations": file_annotations
},
{
"role": "user",
"content": "Can you elaborate on the second point?"
}
]
follow_up_payload = {
"model": "google/gemma-3-27b-it",
"messages": follow_up_messages
}
follow_up_response = requests.post(url, headers=headers, json=follow_up_payload)
print(follow_up_response.json())
import fs from 'fs/promises';
import { fetch } from 'node-fetch';
async function encodePDFToBase64(pdfPath: string): Promise<string> {
const pdfBuffer = await fs.readFile(pdfPath);
const base64PDF = pdfBuffer.toString('base64');
return `data:application/pdf;base64,${base64PDF}`;
}
// Initial request with the PDF
async function processDocument() {
// Read and encode the PDF
const pdfPath = 'path/to/your/document.pdf';
const base64PDF = await encodePDFToBase64(pdfPath);
const initialResponse = await fetch(
'https://openrouter.co/v1/chat/completions',
{
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY_REF}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What are the main points in this document?',
},
{
type: 'file',
file: {
filename: 'document.pdf',
file_data: base64PDF,
},
},
],
},
],
}),
},
);
const initialData = await initialResponse.json();
// Store the annotations from the response
let fileAnnotations = null;
if (initialData.choices && initialData.choices.length > 0) {
if (initialData.choices[0].message.annotations) {
fileAnnotations = initialData.choices[0].message.annotations;
}
}
// Follow-up request using the annotations (without sending the PDF again)
if (fileAnnotations) {
const followUpResponse = await fetch(
'https://openrouter.co/v1/chat/completions',
{
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY_REF}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What are the main points in this document?',
},
{
type: 'file',
file: {
filename: 'document.pdf',
file_data: base64PDF,
},
},
],
},
{
role: 'assistant',
content: 'The document contains information about...',
annotations: fileAnnotations,
},
{
role: 'user',
content: 'Can you elaborate on the second point?',
},
],
}),
},
);
const followUpData = await followUpResponse.json();
console.log(followUpData);
}
}
processDocument();
当你在后续请求中包含先前响应的文件注释时,OpenRouter 会直接使用这些预解析的信息,
而非重新解析 PDF 文件,这能节省处理时间和成本。对于大型文档或使用附带额外成本的
mistral-ocr
引擎时,这一机制尤为有益。
响应格式
API 将返回如下格式的响应:
{
"id": "gen-1234567890",
"provider": "DeepInfra",
"model": "google/gemma-3-27b-it",
"object": "chat.completion",
"created": 1234567890,
"choices": [
{
"message": {
"role": "assistant",
"content": "The document discusses..."
}
}
],
"usage": {
"prompt_tokens": 1000,
"completion_tokens": 100,
"total_tokens": 1100
}
}