1Elasticsearch can index media files by using external tools/parsers to extract the text content of the files.
2
3The parsers can be command line tools or web services. They must return plain text or a JSON structure.
4
5Define parsers for each file extension that should be processed. Use ''%in%'' to specify the input file for CLI tools. Web services must accept the input file as POST data.
6
7Here's a short example:
8
9<code>
10pdf    /usr/bin/pdftotext %in% -
11docx   http://givemetext.okfnlabs.org/tika/rmeta
12</code>
13