Source types

How Cognoir handles each source type.

Cognoir parses fourteen document types and three input formats. The parsing strategy and citation format adapt to each one.

Religious and classical

Quran

Citations use surah and ayah. When the source is a translation, the translator is named (Saheeh International, Yusuf Ali, Pickthall, Mohsin Khan, and others). When the source is the Arabic text, the citation is to the Arabic.

Hadith

Citations include the collection name, book number, hadith number, and the chain of narration where the source provides one. Authenticity grades (sahih, hasan, da'if) are preserved as the source presents them, never inferred.

Tafsir, fiqh, and classical Islamic texts

Citations include author, work, volume, and page. Hijri and Gregorian dating are preserved as the source presents them.

Other religious texts

Author, work, and section. Conservative format; we do not impose interpretive frameworks the source does not provide.

Legal and financial

Case law

Citations include the case name, the official citation, the paragraph reference, and the jurisdiction. Cognoir does not infer jurisdiction from context; if the source does not state it, the citation does not claim it.

Statutes and regulations

Instrument name, article or section, and the official journal reference (OJ L number for EU, USC title:section for US, BOE for Spanish regulations, and so on).

Contracts

Document title, clause reference, and party context. Cross-document conflict detection: ask “where do these contracts disagree on indemnity?” and Cognoir surfaces the conflicting clauses.

Government documents

Instrument type, article or section, and official reference.

Financial filings

Filing type (10-K, 10-Q, 8-K, S-1, and so on), section, and reporting period.

Technical compliance

Document title and section. Treated as primary or secondary based on its source.

Academic, medical, and general

Peer-reviewed papers

Author, year, section (introduction, methods, results, discussion), and page. When the source includes a DOI, it is preserved.

Medical literature

Journal, study title, section, page. Treats systematic reviews and randomised controlled trials with their stated methodology intact.

Books

Author, title, edition, chapter, page. Front matter and end matter are treated as their own structural elements.

Technical documentation

Document, section, version. Software documentation handles code blocks separately from prose.

Educational materials

Source, chapter or module, page. Lecture notes are parsed as their own structural type.

News articles

Publication, date, byline, paragraph. Editorials and opinion pieces are distinguished from reported content where the source provides that distinction.

Input formats

Documents

PDF (with OCR fallback for scanned pages), Word (.docx), slides (.pptx), spreadsheets (.xlsx, .csv), plain text, Markdown, HTML.

Web pages

URLs are fetched, the main content is extracted, and the result is treated as the source type the page represents (a news article cites as news, a research paper cites as a paper, and so on).

Images and scanned pages

JPEG, PNG, and WebP files are processed by Vision OCR. The extracted text is then cited as whatever source type its content represents — a photo of a textbook page becomes a book citation; a screenshot of a contract becomes a contract citation.

Start a 14-day free trial

Try Cognoir with your own documents.