Source types
How Cognoir handles each source type.
Cognoir parses fourteen document types and three input formats. The parsing strategy and citation format adapt to each one.
Religious and classical
Quran
Citations use surah and ayah. When the source is a translation, the translator is named (Saheeh International, Yusuf Ali, Pickthall, Mohsin Khan, and others). When the source is the Arabic text, the citation is to the Arabic.
Hadith
Citations include the collection name, book number, hadith number, and the chain of narration where the source provides one. Authenticity grades (sahih, hasan, da'if) are preserved as the source presents them, never inferred.
Tafsir, fiqh, and classical Islamic texts
Citations include author, work, volume, and page. Hijri and Gregorian dating are preserved as the source presents them.
Other religious texts
Author, work, and section. Conservative format; we do not impose interpretive frameworks the source does not provide.
Legal and financial
Case law
Citations include the case name, the official citation, the paragraph reference, and the jurisdiction. Cognoir does not infer jurisdiction from context; if the source does not state it, the citation does not claim it.
Statutes and regulations
Instrument name, article or section, and the official journal reference (OJ L number for EU, USC title:section for US, BOE for Spanish regulations, and so on).
Contracts
Document title, clause reference, and party context. Cross-document conflict detection: ask “where do these contracts disagree on indemnity?” and Cognoir surfaces the conflicting clauses.
Government documents
Instrument type, article or section, and official reference.
Financial filings
Filing type (10-K, 10-Q, 8-K, S-1, and so on), section, and reporting period.
Technical compliance
Document title and section. Treated as primary or secondary based on its source.
Academic, medical, and general
Peer-reviewed papers
Author, year, section (introduction, methods, results, discussion), and page. When the source includes a DOI, it is preserved.
Medical literature
Journal, study title, section, page. Treats systematic reviews and randomised controlled trials with their stated methodology intact.
Books
Author, title, edition, chapter, page. Front matter and end matter are treated as their own structural elements.
Technical documentation
Document, section, version. Software documentation handles code blocks separately from prose.
Educational materials
Source, chapter or module, page. Lecture notes are parsed as their own structural type.
News articles
Publication, date, byline, paragraph. Editorials and opinion pieces are distinguished from reported content where the source provides that distinction.
Input formats
Documents
PDF (with OCR fallback for scanned pages), Word (.docx), slides (.pptx), spreadsheets (.xlsx, .csv), plain text, Markdown, HTML.
Web pages
URLs are fetched, the main content is extracted, and the result is treated as the source type the page represents (a news article cites as news, a research paper cites as a paper, and so on).
Images and scanned pages
JPEG, PNG, and WebP files are processed by Vision OCR. The extracted text is then cited as whatever source type its content represents — a photo of a textbook page becomes a book citation; a screenshot of a contract becomes a contract citation.
Try Cognoir with your own documents.