“In the middle of every difficulty lies opportunity.” β Albert Einstein
π§ Migrating to OpenSearch with Data Prepper for Log Ingestion
This post chronicles the replacement of Filebeat OSS with OpenSearch Data Prepper in the segmented OPFORGE lab. It captures troubleshooting, configuration, and validation processes that highlight modern, modular ingestion practices suited for a realistic Blue Team SIEM environment.
π Abstract
Problem: Filebeat OSS 7.12.x caused ingestion failures with OpenSearch 2.13.0 due to unsupported _type
metadata in bulk events.
Approach: Replace Filebeat with OpenSearch Data Prepper and rewire ingestion pipelines using declarative YAML configs and mounted volumes.
Alignment: Aligns with GCFA and GCFR (forensic ingestion), GPYC (log analysis), and GREM (log-based malware traces). Also relevant for CISSP/OSCP foundations on secure system design and detection control validation.
Outcome: Successful ingestion into log-pipeline-*
indices using Data Prepper, enabling log parsing and future ML-driven detection correlation.
π Prerequisites
- OpenSearch 2.13.0 running in Docker Compose
- Basic Linux + Docker knowledge
- Segmented OPFORGE lab with
opf-log01
node - Prior failed attempts to use Filebeat OSS
β Tasks This Phase
- Remove legacy Filebeat configuration
- Create Data Prepper pipeline and config files
- Mount them into container via
docker-compose.yml
- Generate test logs and verify end-to-end ingestion
π§ Configuration Summary
π§Ή Purge Filebeat
# Stop services
cd ~/Logstack
docker compose down
# Clean up legacy beats
rm -rf filebeat.yml logs/ /usr/share/filebeat
πΈ Screenshot: Filebeat logs with
_type
rejection error
𧬠Data Prepper Pipeline (pipelines/log-pipeline.yaml)
version: "2"
pipeline:
source:
file:
path: "/usr/share/data-prepper/logs/*.log"
scan_interval: "5s"
encoding: "utf-8"
format: "plain_text"
sink:
- opensearch:
hosts: ["https://opensearch:9200"]
username: "admin"
password: "AdminAdminqwer123!"
index: "log-pipeline-%{+yyyy.MM.dd}"
insecure: true
βοΈ Data Prepper Config (data-prepper-config.yaml)
ssl: false
log-pipeline-default-target: stdout
πΈ Screenshot: data-prepper config directory with pipeline and logs mount visible
π³ Compose Add-on (docker-compose.yml snippet)
data-prepper:
image: opensearchproject/data-prepper:2.3.0
container_name: data-prepper
volumes:
- ./pipelines:/usr/share/data-prepper/pipelines
- ./data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml
- ./logs:/usr/share/data-prepper/logs
environment:
- DATA_PREPPER_LOGS_STDOUT=true
ports:
- "4900:4900"
- "21890:21890"
networks:
- logstack_net
depends_on:
- opensearch
restart: unless-stopped
π§ͺ Test Log Injection
mkdir -p logs
echo "π Hello OPFORGE from Data Prepper - $(date)" >> logs/test.log
πΈ Screenshot: OPFORGE message in test.log
π Validate Index Creation
curl -u admin:AdminAdminqwer123! -k \
"https://localhost:9200/log-pipeline-*/_search?q=message:OPFORGE&pretty"
If empty:
docker logs -f data-prepper
πΈ Screenshot: successful
log-pipeline-*
index in_cat/indices
π¨ Troubleshooting: Disk Space & Permissions
df -h
sudo du -sh /* 2>/dev/null | sort -hr | head -n 15
sudo journalctl --vacuum-time=2d
sudo apt clean
sudo docker system prune -a
πΈ Screenshot: flood watermark warning from OpenSearch logs
π Key Takeaways
- Filebeat OSS is no longer viable for OpenSearch 2.13.x due to
_type
incompatibility - Data Prepper offers a drop-in file source pipeline with better error visibility
- This lays the groundwork for parsing, enrichment, and ECS normalization
π§ On Deck
- Add Sysmon + Zeek logs into the Data Prepper stream
- Use Jupyter to correlate detections with OPFORGE Red Team triggers
- Replace raw logs with parsed ECS format for dashboards
OPFORGE’s detection fabric is now ingesting structured logsβready for enrichment, analytics, and red-vs-blue scenarios.
βH.Y.P.R.