# PDF to Markdown Converter - Complete Setup A production-ready Python script with **mise** task runner for converting PDF files to Markdown format. ## πŸš€ Quick Start ### One-Command Setup ```bash # Install mise (if not already installed) curl https://mise.jdx.dev/install.sh | sh # Navigate to project cd maturaarbeit # Convert all PDFs to Markdown mise run convert ``` That's it! ✨ ## πŸ“¦ What's Included ### Core Files | File | Purpose | |------|---------| | **pdf_to_markdown.py** | Main conversion script (373 lines) | | **requirements.txt** | Python dependencies (pypdf, python-dateutil) | | **mise.toml** | Task runner configuration with 10+ tasks | | **.mise.local.toml** | Local environment overrides (git-ignored) | | **.gitignore** | Git exclusions for cache and build artifacts | ### Documentation | File | Purpose | |------|---------| | **README.md** | This file - overview and quick start | | **PDF_CONVERTER_GUIDE.md** | Complete usage guide for the Python script | | **MISE_GUIDE.md** | Detailed mise task runner documentation | ### Converted Files - **artikel/converted/** - 18 Markdown files (one per PDF) - All PDFs successfully converted βœ“ ## 🎯 Key Features ### PDF Conversion βœ… Extract text from all pages βœ… Preserve page structure with page headers βœ… Extract metadata (title, author, creation date) βœ… Generate YAML front matter βœ… Handle errors gracefully βœ… Progress reporting and summary ### Mise Task Runner βœ… Automatic Python installation (3.11) βœ… Automatic dependency installation βœ… Reproducible builds βœ… Isolated environment βœ… 10+ convenient tasks βœ… Custom path support ## πŸ“‹ Available Tasks Run with: `mise run ` ### Main Tasks ```bash mise run convert # Convert all PDFs (main task) mise run convert-verbose # Convert with detailed logging mise run convert-quiet # Convert silently mise run dry-run # Preview without writing ``` ### Utilities ```bash mise run status # Show conversion progress mise run install # Install dependencies mise run clean # Remove converted markdown mise run clean-all # Remove all artifacts ``` ### Custom Conversion ```bash INPUT_DIR=/path/to/pdfs mise run convert-custom INPUT_DIR=/path OUTPUT_DIR=/out mise run convert-custom ``` ## πŸ“– Documentation Guide ### For Quick Start πŸ‘‰ Read this file (README.md) ### For Python Script Details πŸ‘‰ See **PDF_CONVERTER_GUIDE.md** for: - Installation instructions - Usage examples - Troubleshooting - How the script works - Customization options ### For Mise Task Runner πŸ‘‰ See **MISE_GUIDE.md** for: - Mise installation and setup - Task configuration - Advanced usage - CI/CD integration - Custom task creation ## πŸ”§ Usage Examples ### Convert All PDFs (Default) ```bash mise run convert ``` Output: 18 Markdown files in `artikel/converted/` ### Convert with Verbose Logging ```bash mise run convert-verbose ``` Shows detailed progress for each PDF. ### Preview Conversion ```bash mise run dry-run ``` Shows what would be converted without writing files. ### Check Status ```bash mise run status ``` Output: ``` === PDF Conversion Status === PDF files in artikel/: 18 Markdown files in artikel/converted/: 18 βœ“ All PDFs converted! ``` ## πŸ“ Output Format Each converted PDF becomes a Markdown file with: ```markdown --- title: Document Title author: Author Name created: 2024-02-23 converted: 2024-02-23 14:57:05 source: original.pdf --- # Document Title ## Page 1 [Extracted text...] ## Page 2 [Extracted text...] ``` ## πŸ› οΈ Technical Stack - **Language:** Python 3.11 - **PDF Library:** pypdf 6.7.2 - **Date Parsing:** python-dateutil 2.9.0 - **Task Runner:** mise 2026.2.19 - **Total Script Size:** 12 KB - **Converted Files:** 3.5 MB (18 PDFs β†’ Markdown) ## βœ… Conversion Results **Status:** βœ“ All 18 PDFs successfully converted | Metric | Value | |--------|-------| | Total PDFs | 18 | | Converted | 18 | | Failed | 0 | | Conversion Time | ~28 seconds | | Output Size | 3.5 MB | ### Converted Documents - bewegendeGefΓΌhle.md - ChoreografiealsKulturteknik.md - Choreografie Handwerk und Vision.md - Handout-Choreografieren.md - KlΓ€nge in Bewegung.md - PersoenlichkeitsentwicklungdurchTanzUniBE.md - PsychologyofSport&Exercise.md - SinnundSinneimTanz.md - Sportschule.pdf - Sportunterricht.md - TanzPsychotherapeutischeHilfe.md - TanzpraxisinderForschung.md - WirkfaktorenvonTanz.md - Zwischen Rhythmus und Leistung.md - bewegendeGefΓΌhle.md - choreo.md - choreografiekonzepte_kurz.md - studienpsychischergesundheittanztherapie.md ## πŸ”„ Workflows ### Standard Workflow ```bash # Check status before mise run status # Convert PDFs mise run convert # Verify conversion mise run status # Clean if needed mise run clean-all ``` ### Development Workflow ```bash # Preview what would happen mise run dry-run # Run with verbose logging mise run convert-verbose # Review results ls -lh artikel/converted/ # Check specific file cat artikel/converted/choreo.md | head -20 ``` ### CI/CD Integration ```bash # In GitHub Actions, GitLab CI, etc. curl https://mise.jdx.dev/install.sh | sh mise run convert mise run status ``` ## 🚨 Troubleshooting ### Common Issues **Issue:** "mise: command not found" **Solution:** Install mise: `curl https://mise.jdx.dev/install.sh | sh` **Issue:** "Config files are not trusted" **Solution:** Run `mise trust` **Issue:** "No PDF files found" **Solution:** Check input folder: `ls artikel/*.pdf` **Issue:** Python dependencies not installing **Solution:** Run `mise run install` manually For detailed troubleshooting, see **PDF_CONVERTER_GUIDE.md** or **MISE_GUIDE.md**. ## πŸ“š Additional Resources - **Mise Documentation:** https://mise.jdx.dev/ - **pypdf Documentation:** https://py-pdf.github.io/pypdf/ - **Project Issues:** https://github.com/anomalyco/opencode ## πŸ“ Project Structure ``` maturaarbeit/ β”œβ”€β”€ pdf_to_markdown.py # Main script β”œβ”€β”€ requirements.txt # Dependencies β”œβ”€β”€ mise.toml # Task configuration β”œβ”€β”€ .mise.local.toml # Local overrides (git-ignored) β”œβ”€β”€ .gitignore # Git exclusions β”‚ β”œβ”€β”€ README.md # This file β”œβ”€β”€ PDF_CONVERTER_GUIDE.md # Python script guide β”œβ”€β”€ MISE_GUIDE.md # Task runner guide β”‚ β”œβ”€β”€ artikel/ # Input PDFs β”‚ β”œβ”€β”€ *.pdf # 18 PDF files β”‚ └── converted/ # Output Markdown β”‚ └── *.md # 18 Markdown files β”‚ └── .git/ # Version control ``` ## πŸŽ“ Learning Path **For Users:** 1. Read this README 2. Run `mise run convert` 3. View results in `artikel/converted/` 4. Read **PDF_CONVERTER_GUIDE.md** for details **For Developers:** 1. Read **MISE_GUIDE.md** for task runner 2. Examine `mise.toml` for configuration 3. Review `pdf_to_markdown.py` for implementation 4. Customize as needed ## πŸ” Security - βœ… No external API calls - βœ… All processing local - βœ… No data transmission - βœ… Git-ignored local config - βœ… Standard Python libraries ## πŸ“„ License This project is provided as-is for your use. ## πŸ‘₯ Support - **Mise Issues:** https://mise.jdx.dev/ - **PDF Conversion Issues:** See **PDF_CONVERTER_GUIDE.md** - **Task Runner Issues:** See **MISE_GUIDE.md** - **Project Feedback:** https://github.com/anomalyco/opencode --- **Project Version:** 1.0 **Last Updated:** February 23, 2026 **Status:** βœ… Complete and Tested