- Remove duplicated artikel/artikel/ nested directory - Move PDFs and resources to artikel/ root level - Maintain clean directory structure for PDF conversion
PDF to Markdown Converter - Complete Setup
A production-ready Python script with mise task runner for converting PDF files to Markdown format.
🚀 Quick Start
One-Command Setup
# Install mise (if not already installed)
curl https://mise.jdx.dev/install.sh | sh
# Navigate to project
cd maturaarbeit
# Convert all PDFs to Markdown
mise run convert
That's it! ✨
📦 What's Included
Core Files
| File | Purpose |
|---|---|
| pdf_to_markdown.py | Main conversion script (373 lines) |
| requirements.txt | Python dependencies (pypdf, python-dateutil) |
| mise.toml | Task runner configuration with 10+ tasks |
| .mise.local.toml | Local environment overrides (git-ignored) |
| .gitignore | Git exclusions for cache and build artifacts |
Documentation
| File | Purpose |
|---|---|
| README.md | This file - overview and quick start |
| PDF_CONVERTER_GUIDE.md | Complete usage guide for the Python script |
| MISE_GUIDE.md | Detailed mise task runner documentation |
Converted Files
- artikel/converted/ - 18 Markdown files (one per PDF)
- All PDFs successfully converted ✓
🎯 Key Features
PDF Conversion
✅ Extract text from all pages
✅ Preserve page structure with page headers
✅ Extract metadata (title, author, creation date)
✅ Generate YAML front matter
✅ Handle errors gracefully
✅ Progress reporting and summary
Mise Task Runner
✅ Automatic Python installation (3.11)
✅ Automatic dependency installation
✅ Reproducible builds
✅ Isolated environment
✅ 10+ convenient tasks
✅ Custom path support
📋 Available Tasks
Run with: mise run <task-name>
Main Tasks
mise run convert # Convert all PDFs (main task)
mise run convert-verbose # Convert with detailed logging
mise run convert-quiet # Convert silently
mise run dry-run # Preview without writing
Utilities
mise run status # Show conversion progress
mise run install # Install dependencies
mise run clean # Remove converted markdown
mise run clean-all # Remove all artifacts
Custom Conversion
INPUT_DIR=/path/to/pdfs mise run convert-custom
INPUT_DIR=/path OUTPUT_DIR=/out mise run convert-custom
📖 Documentation Guide
For Quick Start
👉 Read this file (README.md)
For Python Script Details
👉 See PDF_CONVERTER_GUIDE.md for:
- Installation instructions
- Usage examples
- Troubleshooting
- How the script works
- Customization options
For Mise Task Runner
👉 See MISE_GUIDE.md for:
- Mise installation and setup
- Task configuration
- Advanced usage
- CI/CD integration
- Custom task creation
🔧 Usage Examples
Convert All PDFs (Default)
mise run convert
Output: 18 Markdown files in artikel/converted/
Convert with Verbose Logging
mise run convert-verbose
Shows detailed progress for each PDF.
Preview Conversion
mise run dry-run
Shows what would be converted without writing files.
Check Status
mise run status
Output:
=== PDF Conversion Status ===
PDF files in artikel/: 18
Markdown files in artikel/converted/: 18
✓ All PDFs converted!
📁 Output Format
Each converted PDF becomes a Markdown file with:
---
title: Document Title
author: Author Name
created: 2024-02-23
converted: 2024-02-23 14:57:05
source: original.pdf
---
# Document Title
## Page 1
[Extracted text...]
## Page 2
[Extracted text...]
🛠️ Technical Stack
- Language: Python 3.11
- PDF Library: pypdf 6.7.2
- Date Parsing: python-dateutil 2.9.0
- Task Runner: mise 2026.2.19
- Total Script Size: 12 KB
- Converted Files: 3.5 MB (18 PDFs → Markdown)
✅ Conversion Results
Status: ✓ All 18 PDFs successfully converted
| Metric | Value |
|---|---|
| Total PDFs | 18 |
| Converted | 18 |
| Failed | 0 |
| Conversion Time | ~28 seconds |
| Output Size | 3.5 MB |
Converted Documents
- bewegendeGefühle.md
- ChoreografiealsKulturteknik.md
- Choreografie Handwerk und Vision.md
- Handout-Choreografieren.md
- Klänge in Bewegung.md
- PersoenlichkeitsentwicklungdurchTanzUniBE.md
- PsychologyofSport&Exercise.md
- SinnundSinneimTanz.md
- Sportschule.pdf
- Sportunterricht.md
- TanzPsychotherapeutischeHilfe.md
- TanzpraxisinderForschung.md
- WirkfaktorenvonTanz.md
- Zwischen Rhythmus und Leistung.md
- bewegendeGefühle.md
- choreo.md
- choreografiekonzepte_kurz.md
- studienpsychischergesundheittanztherapie.md
🔄 Workflows
Standard Workflow
# Check status before
mise run status
# Convert PDFs
mise run convert
# Verify conversion
mise run status
# Clean if needed
mise run clean-all
Development Workflow
# Preview what would happen
mise run dry-run
# Run with verbose logging
mise run convert-verbose
# Review results
ls -lh artikel/converted/
# Check specific file
cat artikel/converted/choreo.md | head -20
CI/CD Integration
# In GitHub Actions, GitLab CI, etc.
curl https://mise.jdx.dev/install.sh | sh
mise run convert
mise run status
🚨 Troubleshooting
Common Issues
Issue: "mise: command not found"
Solution: Install mise: curl https://mise.jdx.dev/install.sh | sh
Issue: "Config files are not trusted"
Solution: Run mise trust
Issue: "No PDF files found"
Solution: Check input folder: ls artikel/*.pdf
Issue: Python dependencies not installing
Solution: Run mise run install manually
For detailed troubleshooting, see PDF_CONVERTER_GUIDE.md or MISE_GUIDE.md.
📚 Additional Resources
- Mise Documentation: https://mise.jdx.dev/
- pypdf Documentation: https://py-pdf.github.io/pypdf/
- Project Issues: https://github.com/anomalyco/opencode
📝 Project Structure
maturaarbeit/
├── pdf_to_markdown.py # Main script
├── requirements.txt # Dependencies
├── mise.toml # Task configuration
├── .mise.local.toml # Local overrides (git-ignored)
├── .gitignore # Git exclusions
│
├── README.md # This file
├── PDF_CONVERTER_GUIDE.md # Python script guide
├── MISE_GUIDE.md # Task runner guide
│
├── artikel/ # Input PDFs
│ ├── *.pdf # 18 PDF files
│ └── converted/ # Output Markdown
│ └── *.md # 18 Markdown files
│
└── .git/ # Version control
🎓 Learning Path
For Users:
- Read this README
- Run
mise run convert - View results in
artikel/converted/ - Read PDF_CONVERTER_GUIDE.md for details
For Developers:
- Read MISE_GUIDE.md for task runner
- Examine
mise.tomlfor configuration - Review
pdf_to_markdown.pyfor implementation - Customize as needed
🔐 Security
- ✅ No external API calls
- ✅ All processing local
- ✅ No data transmission
- ✅ Git-ignored local config
- ✅ Standard Python libraries
📄 License
This project is provided as-is for your use.
👥 Support
- Mise Issues: https://mise.jdx.dev/
- PDF Conversion Issues: See PDF_CONVERTER_GUIDE.md
- Task Runner Issues: See MISE_GUIDE.md
- Project Feedback: https://github.com/anomalyco/opencode
Project Version: 1.0
Last Updated: February 23, 2026
Status: ✅ Complete and Tested