Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MODEL REQUEST] Kosmos-2.5 #58

Open
EwoutH opened this issue Jun 24, 2024 · 1 comment
Open

[MODEL REQUEST] Kosmos-2.5 #58

EwoutH opened this issue Jun 24, 2024 · 1 comment
Labels
Feature Request New feature or request

Comments

@EwoutH
Copy link

EwoutH commented Jun 24, 2024

Kosmos-2.5 is an relatively small (1.37B params), generative model for machine reading of text-intensive images.

Details of model being requested

  • Model name: Kosmos-2.5
  • Source repo link: https://huggingface.co/microsoft/kosmos-2.5
  • Research paper link: https://arxiv.org/abs/2309.11419
  • Model use case: Kosmos-2.5 is a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared decoder-only auto-regressive Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models.

In addition to the full precision F32 model, might a quantized version also be very useful.

image

@mestrona-3 mestrona-3 added the Feature Request New feature or request label Jun 26, 2024
@mestrona-3
Copy link

Hi @EwoutH, thank you for the feature request and details! I've added it to our list of requested models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants