Skip to main content

Source types

How Cognoir handles each source type.

Cognoir parses fourteen document types and three input formats. The parsing strategy and citation format adapt to each one.

Religious and classical

Quran

Citations use surah and ayah. When the source is a translation, the translator is named (Saheeh International, Yusuf Ali, Pickthall, Mohsin Khan, and others). When the source is the Arabic text, the citation is to the Arabic.

Hadith

Citations include the collection name, book number, hadith number, and the chain of narration where the source provides one. Authenticity grades (sahih, hasan, da'if) are preserved as the source presents them, never inferred.

Tafsir, fiqh, and classical Islamic texts

Citations include author, work, volume, and page. Hijri and Gregorian dating are preserved as the source presents them.

Other religious texts

Author, work, and section. Conservative format; we do not impose interpretive frameworks the source does not provide.

Academic, medical, and general

Peer-reviewed papers

Author, year, section (introduction, methods, results, discussion), and page. When the source includes a DOI, it is preserved.

Medical literature

Journal, study title, section, page. Treats systematic reviews and randomised controlled trials with their stated methodology intact.

Books

Author, title, edition, chapter, page. Front matter and end matter are treated as their own structural elements.

Technical documentation

Document, section, version. Software documentation handles code blocks separately from prose.

Educational materials

Source, chapter or module, page. Lecture notes are parsed as their own structural type.

News articles

Publication, date, byline, paragraph. Editorials and opinion pieces are distinguished from reported content where the source provides that distinction.

Input formats

Documents

PDF (with OCR fallback for scanned pages), Word (.docx), slides (.pptx), spreadsheets (.xlsx, .csv), plain text, Markdown, HTML.

Web pages

URLs are fetched, the main content is extracted, and the result is treated as the source type the page represents (a news article cites as news, a research paper cites as a paper, and so on).

Images and scanned pages

JPEG, PNG, and WebP files are processed by Vision OCR. The extracted text is then cited as whatever source type its content represents — a photo of a textbook page becomes a book citation; a screenshot of a contract becomes a contract citation.

Start a 14-day free trial

Try Cognoir with your own documents.