PyMuPDF 1.24.2 Documentation
encode()) 7.3.2 How to use Markdown output Once you have your data in Markdown format you are ready to chunk/split it and supply it to your LLM, for example, if this is LangChain then do the following: import pdf4llm.to_markdown("input.pdf") # get markdown for all pages splitter = MarkdownTextSplitter(chunk_size=40, chunk_overlap=0) splitter.create_documents([md_text]) For more see 5 Levels of Text Splitting0 码力 | 565 页 | 6.84 MB | 1 年前3PyMuPDF 1.12.2 documentation
photo can be used as an image in TK. Extracting Text We can also extract all text of a page in one chunk of string: >>> text = page.getText(type) Use one of the following strings for type: "text": (default)0 码力 | 387 页 | 2.70 MB | 1 年前3MuPDF 1.22.0 Documentation
chunks to fill this buffer. In the absence of any other impetus the receiver should request the next ‘chunk’ of data from the file that it does not yet have, following the last fill point. Initially we start0 码力 | 175 页 | 698.87 KB | 8 月前3MuPDF 1.23.0 Documentation
chunks to fill this buffer. In the absence of any other impetus the receiver should request the next ‘chunk’ of data from the file that it does not yet have, following the last fill point. Initially we start0 码力 | 245 页 | 817.74 KB | 8 月前3MuPDF 1.25.0 Documentation
chunks to fill this buffer. In the absence of any other impetus the receiver should request the next ‘chunk’ of data from the file that it does not yet have, following the last fill point. Initially we start0 码力 | 259 页 | 1.11 MB | 8 月前3MuPDF 1.24.0 Documentation
chunks to fill this buffer. In the absence of any other impetus the receiver should request the next ‘chunk’ of data from the file that it does not yet have, following the last fill point. Initially we start0 码力 | 249 页 | 830.15 KB | 8 月前3
共 6 条
- 1