MM4go c7ff6a8a29 Add PDF to Markdown converter with mise task runner

- Implement pdf_to_markdown.py script with pypdf for text extraction
- Extract metadata (title, author, creation date) from PDFs
- Generate clean Markdown files with YAML front matter
- Add comprehensive error handling and logging
- Create mise.toml with 10+ convenient tasks for conversion
- Provide detailed documentation (4 guides + quick reference)
- Successfully convert all 18 PDF files in artikel/ folder to Markdown
- Include .gitignore for Python cache and local config

2026-02-23 14:58:58 +01:00

2.0 KiB

Raw Blame History

Quick Reference Card

Mise Commands

# Main conversion
mise run convert              # Convert all PDFs

# Logging options
mise run convert-verbose      # Show detailed logs
mise run convert-quiet        # Errors only

# Preview & Check
mise run dry-run             # Preview without writing
mise run status              # Show progress

# Custom paths
INPUT_DIR=/path mise run convert-custom
INPUT_DIR=/in OUTPUT_DIR=/out mise run convert-custom

# Cleanup
mise run clean               # Remove markdown only
mise run clean-all           # Remove all artifacts

# Help
mise tasks                   # List all tasks
mise run help                # Show task info

File Locations

artikel/
├── *.pdf                    # Input PDFs
└── converted/
    └── *.md                 # Output Markdown

One-Liner Setup

curl https://mise.jdx.dev/install.sh | sh && cd maturaarbeit && mise trust && mise run convert

Output Format

---
title: PDF Title
author: PDF Author
created: 2024-02-23
converted: 2024-02-23 14:32:15
source: filename.pdf
---

# PDF Title

## Page 1
[Text...]

## Page 2
[Text...]

Success Indicators

✅ All tasks complete
✅ 18/18 PDFs converted
✅ 3.5 MB output
✅ No errors

Troubleshooting Quick Fixes

Issue	Fix
mise not found	`curl https://mise.jdx.dev/install.sh \| sh`
Config not trusted	`mise trust`
Dependencies missing	`mise run install`
No PDFs found	Check `ls artikel/*.pdf`
Python not found	First run may take longer

Documentation Map

Question	See
How to use?	README.md
How does the script work?	PDF_CONVERTER_GUIDE.md
How does mise work?	MISE_GUIDE.md
Task details?	mise.toml

Conversion Pipeline

Input PDFs (artikel/*.pdf)
          ↓
    [Python Script]
    - Read PDF
    - Extract metadata
    - Extract text
    - Format Markdown
          ↓
Output Markdown (artikel/converted/*.md)

Print this card for quick reference! 📋

2.0 KiB Raw Blame History

Quick Reference Card

Mise Commands

File Locations

One-Liner Setup

Output Format

Success Indicators

Troubleshooting Quick Fixes

Documentation Map

Conversion Pipeline

2.0 KiB

Raw Blame History