- Implement pdf_to_markdown.py script with pypdf for text extraction - Extract metadata (title, author, creation date) from PDFs - Generate clean Markdown files with YAML front matter - Add comprehensive error handling and logging - Create mise.toml with 10+ convenient tasks for conversion - Provide detailed documentation (4 guides + quick reference) - Successfully convert all 18 PDF files in artikel/ folder to Markdown - Include .gitignore for Python cache and local config
108 lines
2.0 KiB
Markdown
108 lines
2.0 KiB
Markdown
# Quick Reference Card
|
|
|
|
## Mise Commands
|
|
|
|
```bash
|
|
# Main conversion
|
|
mise run convert # Convert all PDFs
|
|
|
|
# Logging options
|
|
mise run convert-verbose # Show detailed logs
|
|
mise run convert-quiet # Errors only
|
|
|
|
# Preview & Check
|
|
mise run dry-run # Preview without writing
|
|
mise run status # Show progress
|
|
|
|
# Custom paths
|
|
INPUT_DIR=/path mise run convert-custom
|
|
INPUT_DIR=/in OUTPUT_DIR=/out mise run convert-custom
|
|
|
|
# Cleanup
|
|
mise run clean # Remove markdown only
|
|
mise run clean-all # Remove all artifacts
|
|
|
|
# Help
|
|
mise tasks # List all tasks
|
|
mise run help # Show task info
|
|
```
|
|
|
|
## File Locations
|
|
|
|
```
|
|
artikel/
|
|
├── *.pdf # Input PDFs
|
|
└── converted/
|
|
└── *.md # Output Markdown
|
|
```
|
|
|
|
## One-Liner Setup
|
|
|
|
```bash
|
|
curl https://mise.jdx.dev/install.sh | sh && cd maturaarbeit && mise trust && mise run convert
|
|
```
|
|
|
|
## Output Format
|
|
|
|
```markdown
|
|
---
|
|
title: PDF Title
|
|
author: PDF Author
|
|
created: 2024-02-23
|
|
converted: 2024-02-23 14:32:15
|
|
source: filename.pdf
|
|
---
|
|
|
|
# PDF Title
|
|
|
|
## Page 1
|
|
[Text...]
|
|
|
|
## Page 2
|
|
[Text...]
|
|
```
|
|
|
|
## Success Indicators
|
|
|
|
✅ All tasks complete
|
|
✅ 18/18 PDFs converted
|
|
✅ 3.5 MB output
|
|
✅ No errors
|
|
|
|
## Troubleshooting Quick Fixes
|
|
|
|
| Issue | Fix |
|
|
|-------|-----|
|
|
| mise not found | `curl https://mise.jdx.dev/install.sh \| sh` |
|
|
| Config not trusted | `mise trust` |
|
|
| Dependencies missing | `mise run install` |
|
|
| No PDFs found | Check `ls artikel/*.pdf` |
|
|
| Python not found | First run may take longer |
|
|
|
|
## Documentation Map
|
|
|
|
| Question | See |
|
|
|----------|-----|
|
|
| How to use? | README.md |
|
|
| How does the script work? | PDF_CONVERTER_GUIDE.md |
|
|
| How does mise work? | MISE_GUIDE.md |
|
|
| Task details? | mise.toml |
|
|
|
|
## Conversion Pipeline
|
|
|
|
```
|
|
Input PDFs (artikel/*.pdf)
|
|
↓
|
|
[Python Script]
|
|
- Read PDF
|
|
- Extract metadata
|
|
- Extract text
|
|
- Format Markdown
|
|
↓
|
|
Output Markdown (artikel/converted/*.md)
|
|
```
|
|
|
|
---
|
|
|
|
Print this card for quick reference! 📋
|