maturaarbeit/MISE_GUIDE.md
MM4go c7ff6a8a29 Add PDF to Markdown converter with mise task runner
- Implement pdf_to_markdown.py script with pypdf for text extraction
- Extract metadata (title, author, creation date) from PDFs
- Generate clean Markdown files with YAML front matter
- Add comprehensive error handling and logging
- Create mise.toml with 10+ convenient tasks for conversion
- Provide detailed documentation (4 guides + quick reference)
- Successfully convert all 18 PDF files in artikel/ folder to Markdown
- Include .gitignore for Python cache and local config
2026-02-23 14:58:58 +01:00

283 lines
6.3 KiB
Markdown

# Mise en Place - PDF to Markdown Converter
A modern task runner configuration for the PDF to Markdown conversion project using [mise](https://mise.jdx.dev/).
## Overview
Mise is a polyglot tool manager that handles tool installations and task execution. This project uses it to:
- Automatically install Python 3.11 and dependencies
- Provide convenient commands for PDF conversion tasks
- Manage development workflows
- Track conversion status
## Installation
### Prerequisites
- **mise** CLI installed: https://mise.jdx.dev/getting-started.html
Quick install:
```bash
curl https://mise.jdx.dev/install.sh | sh
```
### Setup
```bash
# Clone or navigate to the project
cd maturaarbeit
# Trust the configuration files (one-time setup)
mise trust
# Verify installation
mise tasks
```
## Quick Start
### Convert All PDFs
```bash
mise run convert
```
This will:
1. Install dependencies (if not already installed)
2. Run the PDF to Markdown converter
3. Process all PDFs in `artikel/` folder
4. Output Markdown files to `artikel/converted/`
5. Display a conversion summary
### Check Conversion Status
```bash
mise run status
```
Shows:
- Number of PDFs in `artikel/`
- Number of converted Markdown files
- ✓ All PDFs converted (if done)
### Preview Without Writing
```bash
mise run dry-run
```
Shows what PDFs would be converted without actually writing files.
## Available Tasks
| Task | Description |
|------|-------------|
| `install` | Install Python 3.11 and project dependencies |
| `convert` | Convert all PDFs to Markdown (main task) |
| `convert-verbose` | Convert with detailed logging output |
| `convert-quiet` | Convert silently (errors only) |
| `dry-run` | Preview conversion without writing files |
| `convert-custom` | Convert from custom input/output folders |
| `status` | Show conversion status and progress |
| `clean` | Remove converted Markdown files |
| `clean-all` | Remove all build artifacts and cache |
| `help` | List all available tasks |
## Usage Examples
### Basic Conversion
```bash
# Convert all PDFs using defaults
mise run convert
# Convert with verbose logging
mise run convert-verbose
# Convert silently
mise run convert-quiet
```
### Custom Paths
```bash
# Convert from custom input directory
INPUT_DIR=/path/to/pdfs mise run convert-custom
# Specify both input and output directories
INPUT_DIR=/path/to/pdfs OUTPUT_DIR=/path/to/output mise run convert-custom
```
### Cleanup
```bash
# Remove only converted markdown files
mise run clean
# Remove all artifacts (markdown files, cache, __pycache__)
mise run clean-all
```
## Configuration Files
### `mise.toml`
Main configuration file with all tasks, environment variables, and tool versions.
**Key sections:**
- `[env]` - Environment variables (e.g., `PYTHONUNBUFFERED`)
- `[tasks.*]` - Task definitions with descriptions and commands
- `[tools.python]` - Python version specification (3.11)
- `[tools.pipenv]` - Package manager version
### `.mise.local.toml`
Local overrides for environment-specific configuration. Git-ignored file for personal settings.
**Example customizations:**
```toml
# Override input/output directories
INPUT_DIR = "./my_pdfs"
OUTPUT_DIR = "./my_output"
# Custom Python path
PYTHON_PATH = "/usr/local/bin/python3"
```
### `.gitignore`
Excludes mise cache and local configuration from version control.
## How It Works
### Automatic Tool Installation
When you run a task, mise automatically:
1. Detects required tools (Python 3.11)
2. Downloads and installs them if missing
3. Creates isolated environment
4. Executes the task in that environment
### Task Execution
1. **Setup phase** - Install dependencies via `pip install -r requirements.txt`
2. **Execution phase** - Run the Python script with appropriate arguments
3. **Cleanup phase** - Report results and summary
### Environment Variables
```bash
PYTHONUNBUFFERED=1 # Real-time output (no buffering)
INPUT_DIR # Custom input folder (default: ./artikel)
OUTPUT_DIR # Custom output folder (default: ./artikel/converted)
```
## Advantages Over Traditional Approach
### Before (Manual Setup)
```bash
# Install Python globally
# Install pip
# Install dependencies
# Hope everything works
python3 pdf_to_markdown.py
```
### After (Mise)
```bash
# One command - everything handled
mise run convert
```
**Benefits:**
- ✅ Reproducible - Same environment every time
- ✅ Isolated - Tools don't affect system Python
- ✅ Fast - Caches installed tools
- ✅ Easy - Single command to run tasks
- ✅ Portable - Works on any system with mise
- ✅ Documented - Task descriptions built-in
- ✅ Flexible - Environment variables for customization
## Troubleshooting
### Issue: "mise: command not found"
**Solution:** Install mise first
```bash
curl https://mise.jdx.dev/install.sh | sh
```
### Issue: "Config files are not trusted"
**Solution:** Trust the configuration
```bash
mise trust
```
### Issue: Python dependencies not installing
**Solution:** Manually install in the mise environment
```bash
mise run install
```
### Issue: "No PDF files found"
**Solution:** Check the input directory path
```bash
# Verify PDFs exist
ls -la artikel/*.pdf
# If in different location, use custom path
INPUT_DIR=/path/to/pdfs mise run convert-custom
```
### Issue: Slow first run
**Solution:** First run downloads and installs tools (one-time). Subsequent runs are fast.
## Advanced Usage
### Running Tasks from Shell Scripts
```bash
#!/bin/bash
# Run conversion in a script
mise run convert
# Capture exit code
if mise run convert; then
echo "Conversion successful"
mise run status
else
echo "Conversion failed"
exit 1
fi
```
### Integrating with CI/CD
```bash
# GitHub Actions example
- name: Convert PDFs
run: |
curl https://mise.jdx.dev/install.sh | sh
mise run convert
```
### Custom Task Definition
To add a new task, edit `mise.toml`:
```toml
[tasks.my-custom-task]
description = "My custom task description"
run = "echo 'Running custom task'"
depends = ["install"] # Depends on install task
```
Then run:
```bash
mise run my-custom-task
```
## Documentation
- **Project Guide** - See `PDF_CONVERTER_GUIDE.md`
- **Mise Docs** - https://mise.jdx.dev/
- **Python Script** - See `pdf_to_markdown.py`
## Support
For issues or questions:
- Mise documentation: https://mise.jdx.dev/
- Project issues: https://github.com/anomalyco/opencode
---
**Version:** 1.0
**Last Updated:** 2024-02-23