Skip to content

Gennaro Baratta

Backend Software Engineer

I build backend systems, data-heavy tools, and local-first software where reliability matters more than spectacle.

01

Local-first OCR workbench

my-ocr

An OCR tool for PDFs and scanned images. You review and correct the detected layout before any extraction runs, so the output reflects what you actually marked up rather than what the model guessed.

Outcomes

  • Layout regions are editable before OCR runs
  • Exports to markdown, JSON, structured fields, and a run report
  • Rule-based and LLM extraction can be compared on the same document

Context

  • Runs locally by default via Ollama
  • Any OpenAI-compatible or vLLM endpoint works as a drop-in
  • Each run is stored in its own folder so results are reproducible
PythonFletOCROllamaPyMuPDFpytest
02

Address clustering and entity labeling

Bitcoin De-anonymization

Used the multi-input heuristic to cluster Bitcoin addresses, then pulled entity labels from public sources. The goal was to see how much could actually be identified from on-chain data alone.

Outcomes

  • Clustered around 1.2M addresses
  • Added caching after label lookups became the main bottleneck
  • Identified 40+ distinct entity groups

Context

  • Only public data — no proprietary feeds
  • Rate limits on label lookups made enrichment slow
  • The heuristic merges addresses aggressively; false positives are hard to detect
PythonSeleniumAlgorithmsWeb Scraping
03

On-prem CI/CD and release automation

Hospital Flow Platform

Backend and deployment work for a hospital patient-flow system in an air-gapped environment. The challenge was getting reliable, auditable releases with no cloud infrastructure and multi-team approval gates in the way.

Outcomes

  • Release checklist went from 12 manual steps to 3
  • Deploy time dropped from several hours to about 20 minutes
  • Pipeline logs gave the team a clear audit trail per release

Context

  • Air-gapped: no external network access or registries
  • Multi-team approvals required between pipeline stages
  • Everything self-hosted — registry, runner, artifact storage
GitLab CI/CDDockerNestJSAngular

I'm open to backend, platform, and ML infrastructure work in Copenhagen or remote across the EU. Freelance projects can also be a good fit while I'm studying. If you want to reach out, email is easiest. A short note about the role, project, or system you're working on is enough.

gennaro.baratta@gmail.com
© 2026 Gennaro Baratta