- Implement pdf_to_markdown.py script with pypdf for text extraction - Extract metadata (title, author, creation date) from PDFs - Generate clean Markdown files with YAML front matter - Add comprehensive error handling and logging - Create mise.toml with 10+ convenient tasks for conversion - Provide detailed documentation (4 guides + quick reference) - Successfully convert all 18 PDF files in artikel/ folder to Markdown - Include .gitignore for Python cache and local config
2.0 KiB
2.0 KiB
Quick Reference Card
Mise Commands
# Main conversion
mise run convert # Convert all PDFs
# Logging options
mise run convert-verbose # Show detailed logs
mise run convert-quiet # Errors only
# Preview & Check
mise run dry-run # Preview without writing
mise run status # Show progress
# Custom paths
INPUT_DIR=/path mise run convert-custom
INPUT_DIR=/in OUTPUT_DIR=/out mise run convert-custom
# Cleanup
mise run clean # Remove markdown only
mise run clean-all # Remove all artifacts
# Help
mise tasks # List all tasks
mise run help # Show task info
File Locations
artikel/
├── *.pdf # Input PDFs
└── converted/
└── *.md # Output Markdown
One-Liner Setup
curl https://mise.jdx.dev/install.sh | sh && cd maturaarbeit && mise trust && mise run convert
Output Format
---
title: PDF Title
author: PDF Author
created: 2024-02-23
converted: 2024-02-23 14:32:15
source: filename.pdf
---
# PDF Title
## Page 1
[Text...]
## Page 2
[Text...]
Success Indicators
✅ All tasks complete
✅ 18/18 PDFs converted
✅ 3.5 MB output
✅ No errors
Troubleshooting Quick Fixes
| Issue | Fix |
|---|---|
| mise not found | curl https://mise.jdx.dev/install.sh | sh |
| Config not trusted | mise trust |
| Dependencies missing | mise run install |
| No PDFs found | Check ls artikel/*.pdf |
| Python not found | First run may take longer |
Documentation Map
| Question | See |
|---|---|
| How to use? | README.md |
| How does the script work? | PDF_CONVERTER_GUIDE.md |
| How does mise work? | MISE_GUIDE.md |
| Task details? | mise.toml |
Conversion Pipeline
Input PDFs (artikel/*.pdf)
↓
[Python Script]
- Read PDF
- Extract metadata
- Extract text
- Format Markdown
↓
Output Markdown (artikel/converted/*.md)
Print this card for quick reference! 📋