1*af51f172SAndreas Gohr# AGENTS.md 2*af51f172SAndreas Gohr 3*af51f172SAndreas GohrThis file provides guidance to LLM Code Agents when working with code in this repository. 4*af51f172SAndreas Gohr 5*af51f172SAndreas Gohr## Overview 6*af51f172SAndreas Gohr 7*af51f172SAndreas GohrThis is a DokuWiki plugin that enables AI-powered chat functionality using LLMs (Large Language Models) and RAG (Retrieval-Augmented Generation). The plugin indexes wiki pages as embeddings in a vector database and allows users to ask questions about wiki content. 8*af51f172SAndreas Gohr 9*af51f172SAndreas Gohr## Development Commands 10*af51f172SAndreas Gohr 11*af51f172SAndreas Gohr### Testing 12*af51f172SAndreas Gohr```bash 13*af51f172SAndreas Gohr../../../bin/plugin.php dev test 14*af51f172SAndreas Gohr``` 15*af51f172SAndreas Gohr 16*af51f172SAndreas Gohr### CLI Commands 17*af51f172SAndreas GohrThe plugin provides a CLI interface via `cli.php`: 18*af51f172SAndreas Gohr 19*af51f172SAndreas Gohr```bash 20*af51f172SAndreas Gohr# Get a list of available commands 21*af51f172SAndreas Gohr../../../bin/plugin.php aichat --help 22*af51f172SAndreas Gohr``` 23*af51f172SAndreas Gohr 24*af51f172SAndreas Gohr## Architecture 25*af51f172SAndreas Gohr 26*af51f172SAndreas Gohr### Core Components 27*af51f172SAndreas Gohr 28*af51f172SAndreas Gohr**helper.php (helper_plugin_aichat)** 29*af51f172SAndreas Gohr- Main entry point for plugin functionality 30*af51f172SAndreas Gohr- Manages model factory and configuration 31*af51f172SAndreas Gohr- Handles question answering with context retrieval 32*af51f172SAndreas Gohr- Prepares messages with chat history and token limits 33*af51f172SAndreas Gohr- Implements question rephrasing for better context search 34*af51f172SAndreas Gohr 35*af51f172SAndreas Gohr**Embeddings.php** 36*af51f172SAndreas Gohr- Manages the vector embeddings index 37*af51f172SAndreas Gohr- Splits pages into chunks using TextSplitter 38*af51f172SAndreas Gohr- Creates and retrieves embeddings via embedding models 39*af51f172SAndreas Gohr- Performs similarity searches through storage backends 40*af51f172SAndreas Gohr- Handles incremental indexing (only updates changed pages) 41*af51f172SAndreas Gohr 42*af51f172SAndreas Gohr**TextSplitter.php** 43*af51f172SAndreas Gohr- Splits text into token-sized chunks (configurable, typically ~1000 tokens) 44*af51f172SAndreas Gohr- Prefers sentence boundaries using Vanderlee\Sentence 45*af51f172SAndreas Gohr- Handles long sentences by splitting at word boundaries 46*af51f172SAndreas Gohr- Maintains overlap between chunks (MAX_OVERLAP_LEN = 200 tokens) for context preservation 47*af51f172SAndreas Gohr 48*af51f172SAndreas Gohr**ModelFactory.php** 49*af51f172SAndreas Gohr- Creates and caches model instances (chat, rephrase, embedding) 50*af51f172SAndreas Gohr- Loads model configurations from Model/*/models.json files 51*af51f172SAndreas Gohr- Supports multiple providers: OpenAI, Gemini, Anthropic, Mistral, Ollama, Groq, Reka, VoyageAI 52*af51f172SAndreas Gohr 53*af51f172SAndreas Gohr### Model System 54*af51f172SAndreas Gohr 55*af51f172SAndreas Gohr**Model/AbstractModel.php** 56*af51f172SAndreas Gohr- Base class for all LLM implementations 57*af51f172SAndreas Gohr- Handles API communication with retry logic (MAX_RETRIES = 3) 58*af51f172SAndreas Gohr- Tracks usage statistics (tokens, costs, time, requests) 59*af51f172SAndreas Gohr- Implements debug mode for API inspection 60*af51f172SAndreas Gohr- Uses DokuHTTPClient for HTTP requests 61*af51f172SAndreas Gohr 62*af51f172SAndreas Gohr**Model Interfaces** 63*af51f172SAndreas Gohr- `ChatInterface`: For conversational models (getAnswer method) 64*af51f172SAndreas Gohr- `EmbeddingInterface`: For embedding models (getEmbedding method, getDimensions method) 65*af51f172SAndreas Gohr- `ModelInterface`: Base interface with token limits and pricing info 66*af51f172SAndreas Gohr 67*af51f172SAndreas Gohr**Model Providers** 68*af51f172SAndreas GohrEach provider has its own namespace under Model/: 69*af51f172SAndreas Gohr- OpenAI/, Gemini/, Anthropic/, Mistral/, Ollama/, Groq/, Reka/, VoyageAI/ 70*af51f172SAndreas Gohr- Each contains ChatModel.php and/or EmbeddingModel.php 71*af51f172SAndreas Gohr- Model info (token limits, pricing, dimensions) defined in models.json 72*af51f172SAndreas Gohr 73*af51f172SAndreas Gohr### Storage Backends 74*af51f172SAndreas Gohr 75*af51f172SAndreas Gohr**Storage/AbstractStorage.php** 76*af51f172SAndreas Gohr- Abstract base for vector storage implementations 77*af51f172SAndreas Gohr- Defines interface for chunk storage and similarity search 78*af51f172SAndreas Gohr 79*af51f172SAndreas Gohr**Available Implementations:** 80*af51f172SAndreas Gohr- SQLiteStorage: Local SQLite database 81*af51f172SAndreas Gohr- ChromaStorage: Chroma vector database 82*af51f172SAndreas Gohr- PineconeStorage: Pinecone cloud service 83*af51f172SAndreas Gohr- QdrantStorage: Qdrant vector database 84*af51f172SAndreas Gohr 85*af51f172SAndreas Gohr### Data Flow 86*af51f172SAndreas Gohr 87*af51f172SAndreas Gohr1. **Indexing**: Pages → TextSplitter → Chunks → EmbeddingModel → Vector Storage 88*af51f172SAndreas Gohr2. **Querying**: Question → EmbeddingModel → Vector → Storage.getSimilarChunks() → Filtered Chunks 89*af51f172SAndreas Gohr3. **Chat**: Question + History + Context Chunks → ChatModel → Answer 90*af51f172SAndreas Gohr 91*af51f172SAndreas Gohr### Key Features 92*af51f172SAndreas Gohr 93*af51f172SAndreas Gohr**Question Rephrasing** 94*af51f172SAndreas Gohr- Converts follow-up questions into standalone questions using chat history 95*af51f172SAndreas Gohr- Controlled by `rephraseHistory` config (number of history entries to use) 96*af51f172SAndreas Gohr- Only applied when rephraseHistory > chatHistory to avoid redundancy 97*af51f172SAndreas Gohr 98*af51f172SAndreas Gohr**Context Management** 99*af51f172SAndreas Gohr- Chunks include breadcrumb trail (namespace hierarchy + page title) 100*af51f172SAndreas Gohr- Token counting uses tiktoken-php for accurate limits 101*af51f172SAndreas Gohr- Respects model's max input token length 102*af51f172SAndreas Gohr- Filters chunks by ACL permissions and similarity threshold 103*af51f172SAndreas Gohr 104*af51f172SAndreas Gohr**Language Support** 105*af51f172SAndreas Gohr- `preferUIlanguage` setting controls language behavior: 106*af51f172SAndreas Gohr - LANG_AUTO_ALL: Auto-detect from question 107*af51f172SAndreas Gohr - LANG_UI_ALL: Always use UI language 108*af51f172SAndreas Gohr - LANG_UI_LIMITED: Use UI language and limit sources to that language 109*af51f172SAndreas Gohr 110*af51f172SAndreas Gohr### AJAX Integration 111*af51f172SAndreas Gohr 112*af51f172SAndreas Gohr**action.php** 113*af51f172SAndreas Gohr- Handles `AJAX_CALL_UNKNOWN` event for 'aichat' calls 114*af51f172SAndreas Gohr- Processes questions with chat history 115*af51f172SAndreas Gohr- Returns JSON with answer (as rendered Markdown), sources, and similarity scores 116*af51f172SAndreas Gohr- Implements access restrictions via helper->userMayAccess() 117*af51f172SAndreas Gohr- Optional logging of all interactions 118*af51f172SAndreas Gohr 119*af51f172SAndreas Gohr### Frontend 120*af51f172SAndreas Gohr- **script/**: JavaScript for UI integration 121*af51f172SAndreas Gohr- **syntax/**: DokuWiki syntax components 122*af51f172SAndreas Gohr- **renderer.php**: Custom renderer for AI chat output 123*af51f172SAndreas Gohr 124*af51f172SAndreas Gohr## Configuration 125*af51f172SAndreas Gohr 126*af51f172SAndreas GohrPlugin configuration is in `conf/`: 127*af51f172SAndreas Gohr- **default.php**: Default config values 128*af51f172SAndreas Gohr- **metadata.php**: Config field definitions and validation 129*af51f172SAndreas Gohr 130*af51f172SAndreas GohrKey settings: 131*af51f172SAndreas Gohr- Model selection: chatmodel, rephrasemodel, embedmodel 132*af51f172SAndreas Gohr- Storage: storage backend type 133*af51f172SAndreas Gohr- API keys: openai_apikey, gemini_apikey, etc. 134*af51f172SAndreas Gohr- Chunk settings: chunkSize, contextChunks, similarityThreshold 135*af51f172SAndreas Gohr- History: chatHistory, rephraseHistory 136*af51f172SAndreas Gohr- Access: restrict (user/group restrictions) 137*af51f172SAndreas Gohr- Indexing filters: skipRegex, matchRegex 138*af51f172SAndreas Gohr 139*af51f172SAndreas Gohr## Testing 140*af51f172SAndreas Gohr 141*af51f172SAndreas GohrTests are in `_test/` directory: 142*af51f172SAndreas Gohr- Extends DokuWikiTest base class 143*af51f172SAndreas Gohr- Uses @group plugin_aichat annotation 144*af51f172SAndreas Gohr 145*af51f172SAndreas Gohr## Important Implementation Notes 146*af51f172SAndreas Gohr 147*af51f172SAndreas Gohr- All token counting uses TikToken encoder for rough estimates 148*af51f172SAndreas Gohr- Chunk IDs are calculated as: pageID * 100 + chunk_sequence (pageIDs come from DokuWiki's internal search index) 149*af51f172SAndreas Gohr- Models are cached in ModelFactory to avoid re-initialization 150*af51f172SAndreas Gohr- API retries use exponential backoff (sleep for retry count seconds) 151*af51f172SAndreas Gohr- Breadcrumb trails provide context to AI without requiring full page content 152*af51f172SAndreas Gohr- Storage backends handle similarity search differently but provide unified interface 153*af51f172SAndreas Gohr- UTF-8 handling is critical for text splitting (uses dokuwiki\Utf8\PhpString) 154