1af51f172SAndreas Gohr# AGENTS.md 2af51f172SAndreas Gohr 3af51f172SAndreas GohrThis file provides guidance to LLM Code Agents when working with code in this repository. 4af51f172SAndreas Gohr 5af51f172SAndreas Gohr## Overview 6af51f172SAndreas Gohr 7af51f172SAndreas GohrThis is a DokuWiki plugin that enables AI-powered chat functionality using LLMs (Large Language Models) and RAG (Retrieval-Augmented Generation). The plugin indexes wiki pages as embeddings in a vector database and allows users to ask questions about wiki content. 8af51f172SAndreas Gohr 9af51f172SAndreas Gohr## Development Commands 10af51f172SAndreas Gohr 11af51f172SAndreas Gohr### Testing 12*e2b35d46SAndreas Gohr 13af51f172SAndreas Gohr```bash 14af51f172SAndreas Gohr../../../bin/plugin.php dev test 15af51f172SAndreas Gohr``` 16af51f172SAndreas Gohr 17*e2b35d46SAndreas GohrThe command does not accept any additional arguments or parameters and runs all tests in the `_test/` directory. 18*e2b35d46SAndreas Gohr 19*e2b35d46SAndreas GohrPHPUnit can also be called directly, when special options are needed: 20*e2b35d46SAndreas Gohr 21*e2b35d46SAndreas Gohr```bash 22*e2b35d46SAndreas Gohr../../../_test/vendor/bin/phpunit -c ../../../_test/phpunit.xml --group plugin_aichat 23*e2b35d46SAndreas Gohr``` 24*e2b35d46SAndreas Gohr 25af51f172SAndreas Gohr### CLI Commands 26af51f172SAndreas GohrThe plugin provides a CLI interface via `cli.php`: 27af51f172SAndreas Gohr 28af51f172SAndreas Gohr```bash 29af51f172SAndreas Gohr# Get a list of available commands 30af51f172SAndreas Gohr../../../bin/plugin.php aichat --help 31af51f172SAndreas Gohr``` 32af51f172SAndreas Gohr 33af51f172SAndreas Gohr## Architecture 34af51f172SAndreas Gohr 35af51f172SAndreas Gohr### Core Components 36af51f172SAndreas Gohr 37af51f172SAndreas Gohr**helper.php (helper_plugin_aichat)** 38af51f172SAndreas Gohr- Main entry point for plugin functionality 39af51f172SAndreas Gohr- Manages model factory and configuration 40af51f172SAndreas Gohr- Handles question answering with context retrieval 41af51f172SAndreas Gohr- Prepares messages with chat history and token limits 42af51f172SAndreas Gohr- Implements question rephrasing for better context search 43af51f172SAndreas Gohr 44af51f172SAndreas Gohr**Embeddings.php** 45af51f172SAndreas Gohr- Manages the vector embeddings index 46af51f172SAndreas Gohr- Splits pages into chunks using TextSplitter 47af51f172SAndreas Gohr- Creates and retrieves embeddings via embedding models 48af51f172SAndreas Gohr- Performs similarity searches through storage backends 49af51f172SAndreas Gohr- Handles incremental indexing (only updates changed pages) 50af51f172SAndreas Gohr 51af51f172SAndreas Gohr**TextSplitter.php** 52af51f172SAndreas Gohr- Splits text into token-sized chunks (configurable, typically ~1000 tokens) 53af51f172SAndreas Gohr- Prefers sentence boundaries using Vanderlee\Sentence 54af51f172SAndreas Gohr- Handles long sentences by splitting at word boundaries 55af51f172SAndreas Gohr- Maintains overlap between chunks (MAX_OVERLAP_LEN = 200 tokens) for context preservation 56af51f172SAndreas Gohr 57af51f172SAndreas Gohr**ModelFactory.php** 58af51f172SAndreas Gohr- Creates and caches model instances (chat, rephrase, embedding) 59af51f172SAndreas Gohr- Loads model configurations from Model/*/models.json files 60af51f172SAndreas Gohr- Supports multiple providers: OpenAI, Gemini, Anthropic, Mistral, Ollama, Groq, Reka, VoyageAI 61af51f172SAndreas Gohr 62af51f172SAndreas Gohr### Model System 63af51f172SAndreas Gohr 64af51f172SAndreas Gohr**Model/AbstractModel.php** 65af51f172SAndreas Gohr- Base class for all LLM implementations 66af51f172SAndreas Gohr- Handles API communication with retry logic (MAX_RETRIES = 3) 67af51f172SAndreas Gohr- Tracks usage statistics (tokens, costs, time, requests) 68af51f172SAndreas Gohr- Implements debug mode for API inspection 69af51f172SAndreas Gohr- Uses DokuHTTPClient for HTTP requests 70af51f172SAndreas Gohr 71af51f172SAndreas Gohr**Model Interfaces** 72af51f172SAndreas Gohr- `ChatInterface`: For conversational models (getAnswer method) 73af51f172SAndreas Gohr- `EmbeddingInterface`: For embedding models (getEmbedding method, getDimensions method) 74af51f172SAndreas Gohr- `ModelInterface`: Base interface with token limits and pricing info 75af51f172SAndreas Gohr 76af51f172SAndreas Gohr**Model Providers** 77af51f172SAndreas GohrEach provider has its own namespace under Model/: 78af51f172SAndreas Gohr- OpenAI/, Gemini/, Anthropic/, Mistral/, Ollama/, Groq/, Reka/, VoyageAI/ 79af51f172SAndreas Gohr- Each contains ChatModel.php and/or EmbeddingModel.php 80af51f172SAndreas Gohr- Model info (token limits, pricing, dimensions) defined in models.json 81af51f172SAndreas Gohr 82af51f172SAndreas Gohr### Storage Backends 83af51f172SAndreas Gohr 84af51f172SAndreas Gohr**Storage/AbstractStorage.php** 85af51f172SAndreas Gohr- Abstract base for vector storage implementations 86af51f172SAndreas Gohr- Defines interface for chunk storage and similarity search 87af51f172SAndreas Gohr 88af51f172SAndreas Gohr**Available Implementations:** 89af51f172SAndreas Gohr- SQLiteStorage: Local SQLite database 90af51f172SAndreas Gohr- ChromaStorage: Chroma vector database 91af51f172SAndreas Gohr- PineconeStorage: Pinecone cloud service 92af51f172SAndreas Gohr- QdrantStorage: Qdrant vector database 93af51f172SAndreas Gohr 94af51f172SAndreas Gohr### Data Flow 95af51f172SAndreas Gohr 96af51f172SAndreas Gohr1. **Indexing**: Pages → TextSplitter → Chunks → EmbeddingModel → Vector Storage 97af51f172SAndreas Gohr2. **Querying**: Question → EmbeddingModel → Vector → Storage.getSimilarChunks() → Filtered Chunks 98af51f172SAndreas Gohr3. **Chat**: Question + History + Context Chunks → ChatModel → Answer 99af51f172SAndreas Gohr 100af51f172SAndreas Gohr### Key Features 101af51f172SAndreas Gohr 102af51f172SAndreas Gohr**Question Rephrasing** 103af51f172SAndreas Gohr- Converts follow-up questions into standalone questions using chat history 104af51f172SAndreas Gohr- Controlled by `rephraseHistory` config (number of history entries to use) 105af51f172SAndreas Gohr- Only applied when rephraseHistory > chatHistory to avoid redundancy 106af51f172SAndreas Gohr 107af51f172SAndreas Gohr**Context Management** 108af51f172SAndreas Gohr- Chunks include breadcrumb trail (namespace hierarchy + page title) 109af51f172SAndreas Gohr- Token counting uses tiktoken-php for accurate limits 110af51f172SAndreas Gohr- Respects model's max input token length 111af51f172SAndreas Gohr- Filters chunks by ACL permissions and similarity threshold 112af51f172SAndreas Gohr 113af51f172SAndreas Gohr**Language Support** 114af51f172SAndreas Gohr- `preferUIlanguage` setting controls language behavior: 115af51f172SAndreas Gohr - LANG_AUTO_ALL: Auto-detect from question 116af51f172SAndreas Gohr - LANG_UI_ALL: Always use UI language 117af51f172SAndreas Gohr - LANG_UI_LIMITED: Use UI language and limit sources to that language 118af51f172SAndreas Gohr 119af51f172SAndreas Gohr### AJAX Integration 120af51f172SAndreas Gohr 121af51f172SAndreas Gohr**action.php** 122af51f172SAndreas Gohr- Handles `AJAX_CALL_UNKNOWN` event for 'aichat' calls 123af51f172SAndreas Gohr- Processes questions with chat history 124af51f172SAndreas Gohr- Returns JSON with answer (as rendered Markdown), sources, and similarity scores 125af51f172SAndreas Gohr- Implements access restrictions via helper->userMayAccess() 126af51f172SAndreas Gohr- Optional logging of all interactions 127af51f172SAndreas Gohr 128af51f172SAndreas Gohr### Frontend 129af51f172SAndreas Gohr- **script/**: JavaScript for UI integration 130af51f172SAndreas Gohr- **syntax/**: DokuWiki syntax components 131af51f172SAndreas Gohr- **renderer.php**: Custom renderer for AI chat output 132af51f172SAndreas Gohr 133af51f172SAndreas Gohr## Configuration 134af51f172SAndreas Gohr 135af51f172SAndreas GohrPlugin configuration is in `conf/`: 136af51f172SAndreas Gohr- **default.php**: Default config values 137af51f172SAndreas Gohr- **metadata.php**: Config field definitions and validation 138af51f172SAndreas Gohr 139af51f172SAndreas GohrKey settings: 140af51f172SAndreas Gohr- Model selection: chatmodel, rephrasemodel, embedmodel 141af51f172SAndreas Gohr- Storage: storage backend type 142af51f172SAndreas Gohr- API keys: openai_apikey, gemini_apikey, etc. 143af51f172SAndreas Gohr- Chunk settings: chunkSize, contextChunks, similarityThreshold 144af51f172SAndreas Gohr- History: chatHistory, rephraseHistory 145af51f172SAndreas Gohr- Access: restrict (user/group restrictions) 146af51f172SAndreas Gohr- Indexing filters: skipRegex, matchRegex 147af51f172SAndreas Gohr 148af51f172SAndreas Gohr## Testing 149af51f172SAndreas Gohr 150af51f172SAndreas GohrTests are in `_test/` directory: 151af51f172SAndreas Gohr- Extends DokuWikiTest base class 152af51f172SAndreas Gohr- Uses @group plugin_aichat annotation 153af51f172SAndreas Gohr 154af51f172SAndreas Gohr## Important Implementation Notes 155af51f172SAndreas Gohr 156af51f172SAndreas Gohr- All token counting uses TikToken encoder for rough estimates 157af51f172SAndreas Gohr- Chunk IDs are calculated as: pageID * 100 + chunk_sequence (pageIDs come from DokuWiki's internal search index) 158af51f172SAndreas Gohr- Models are cached in ModelFactory to avoid re-initialization 159af51f172SAndreas Gohr- API retries use exponential backoff (sleep for retry count seconds) 160af51f172SAndreas Gohr- Breadcrumb trails provide context to AI without requiring full page content 161af51f172SAndreas Gohr- Storage backends handle similarity search differently but provide unified interface 162af51f172SAndreas Gohr- UTF-8 handling is critical for text splitting (uses dokuwiki\Utf8\PhpString) 163