“In the middle of every difficulty lies opportunity.” β€” Albert Einstein

🧠 Migrating to OpenSearch with Data Prepper for Log Ingestion

This post chronicles the replacement of Filebeat OSS with OpenSearch Data Prepper in the segmented OPFORGE lab. It captures troubleshooting, configuration, and validation processes that highlight modern, modular ingestion practices suited for a realistic Blue Team SIEM environment.


πŸ“Œ Abstract

Problem: Filebeat OSS 7.12.x caused ingestion failures with OpenSearch 2.13.0 due to unsupported _type metadata in bulk events.

Approach: Replace Filebeat with OpenSearch Data Prepper and rewire ingestion pipelines using declarative YAML configs and mounted volumes.

Alignment: Aligns with GCFA and GCFR (forensic ingestion), GPYC (log analysis), and GREM (log-based malware traces). Also relevant for CISSP/OSCP foundations on secure system design and detection control validation.

Outcome: Successful ingestion into log-pipeline-* indices using Data Prepper, enabling log parsing and future ML-driven detection correlation.


πŸ“š Prerequisites

  • OpenSearch 2.13.0 running in Docker Compose
  • Basic Linux + Docker knowledge
  • Segmented OPFORGE lab with opf-log01 node
  • Prior failed attempts to use Filebeat OSS

βœ… Tasks This Phase

  • Remove legacy Filebeat configuration
  • Create Data Prepper pipeline and config files
  • Mount them into container via docker-compose.yml
  • Generate test logs and verify end-to-end ingestion

πŸ”§ Configuration Summary

🧹 Purge Filebeat

# Stop services
cd ~/Logstack
docker compose down

# Clean up legacy beats
rm -rf filebeat.yml logs/ /usr/share/filebeat

πŸ“Έ Screenshot: Filebeat logs with _type rejection error


🧬 Data Prepper Pipeline (pipelines/log-pipeline.yaml)

version: "2"
pipeline:
  source:
    file:
      path: "/usr/share/data-prepper/logs/*.log"
      scan_interval: "5s"
      encoding: "utf-8"
      format: "plain_text"
  sink:
    - opensearch:
        hosts: ["https://opensearch:9200"]
        username: "admin"
        password: "AdminAdminqwer123!"
        index: "log-pipeline-%{+yyyy.MM.dd}"
        insecure: true

βš™οΈ Data Prepper Config (data-prepper-config.yaml)

ssl: false
log-pipeline-default-target: stdout

πŸ“Έ Screenshot: data-prepper config directory with pipeline and logs mount visible


🐳 Compose Add-on (docker-compose.yml snippet)

data-prepper:
  image: opensearchproject/data-prepper:2.3.0
  container_name: data-prepper
  volumes:
    - ./pipelines:/usr/share/data-prepper/pipelines
    - ./data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml
    - ./logs:/usr/share/data-prepper/logs
  environment:
    - DATA_PREPPER_LOGS_STDOUT=true
  ports:
    - "4900:4900"
    - "21890:21890"
  networks:
    - logstack_net
  depends_on:
    - opensearch
  restart: unless-stopped

πŸ§ͺ Test Log Injection

mkdir -p logs
echo "πŸš€ Hello OPFORGE from Data Prepper - $(date)" >> logs/test.log

πŸ“Έ Screenshot: OPFORGE message in test.log


πŸ” Validate Index Creation

curl -u admin:AdminAdminqwer123! -k \
  "https://localhost:9200/log-pipeline-*/_search?q=message:OPFORGE&pretty"

If empty:

docker logs -f data-prepper

πŸ“Έ Screenshot: successful log-pipeline-* index in _cat/indices


🚨 Troubleshooting: Disk Space & Permissions

df -h
sudo du -sh /* 2>/dev/null | sort -hr | head -n 15
sudo journalctl --vacuum-time=2d
sudo apt clean
sudo docker system prune -a

πŸ“Έ Screenshot: flood watermark warning from OpenSearch logs


🌟 Key Takeaways

  • Filebeat OSS is no longer viable for OpenSearch 2.13.x due to _type incompatibility
  • Data Prepper offers a drop-in file source pipeline with better error visibility
  • This lays the groundwork for parsing, enrichment, and ECS normalization

🧭 On Deck

  • Add Sysmon + Zeek logs into the Data Prepper stream
  • Use Jupyter to correlate detections with OPFORGE Red Team triggers
  • Replace raw logs with parsed ECS format for dashboards

OPFORGE’s detection fabric is now ingesting structured logsβ€”ready for enrichment, analytics, and red-vs-blue scenarios.

β€”H.Y.P.R.