All Articles

2025

Riddle Registry — Metadata Forensics and Encoded Artifact Recovery from a PDF

Structured forensic inspection of a PDF file using integrity verification, string analysis, and metadata examination to recover a Base64-encoded flag from the Author field.

Digital ForensicsPDF ForensicsMetadataCTF

Riddle Registry — Metadata Forensics and Encoded Artifact Recovery from a PDF

Category: Digital Forensics | Metadata Analysis | PDF Forensics Tools Used: file, ls, sha256sum, strings, ripgrep, exiftool, Base64 Decoder Difficulty: Beginner


Objective

This analysis examines a PDF file labeled confidential.pdf suspected of containing a concealed flag. The objective was to conduct a structured forensic inspection of the file — beginning with integrity verification, followed by string analysis and metadata examination — to identify and extract any hidden artifacts.


Methodology

1. Evidence Acquisition and Integrity Verification

Before any analysis was performed, the file was examined in an isolated environment to establish a baseline record. File type verification, size, and a cryptographic hash were documented:

file confidential.pdf
ls -lh confidential.pdf
sha256sum confidential.pdf

Finding: The file was confirmed as a valid PDF document. The SHA-256 hash was recorded to establish a integrity baseline, ensuring any future comparison could verify the file had not been modified during analysis. This step reflects standard evidence handling practice — documenting file state prior to examination.


2. Visible String Analysis

The next step was scanning the raw file contents for any readable strings, specifically searching for flag-formatted patterns. This technique is effective at identifying data embedded in plain text within a file's byte stream, regardless of whether it is visible when the file is rendered normally:

strings confidential.pdf | grep -i "picoCTF|picoCTF\{"

Finding: No flag-formatted strings were detected in the visible content. This result narrowed the investigation toward non-visible storage locations such as file metadata fields, which are not rendered during normal document viewing and are frequently overlooked.


3. Metadata Inspection

PDF documents support two metadata formats: the Info Dictionary and XMP metadata. Both can contain arbitrary text fields that are invisible to a standard reader but accessible through forensic tools. exiftool was used to perform a full metadata extraction:

exiftool confidential.pdf

Output:

ExifTool Version Number         : 12.40
File Name                       : confidential.pdf
File Size                       : 178 KiB
File Type                       : PDF
MIME Type                       : application/pdf
PDF Version                     : 1.7
Linearized                      : No
Page Count                      : 1
Producer                        : PyPDF2
Author                          : cGljb0NURntwdXp6bDNkX20zdGFkYXRhX2YwdW5kIV9lZTQ1NDk1MH0=

Finding: The Author metadata field contained an anomalous value ending in =, a structural characteristic of Base64-encoded strings. The padding character = is appended to Base64 output to maintain correct byte alignment, making it a reliable visual indicator of this encoding scheme.

Notably, the Producer field identified PyPDF2 as the tool used to generate the document — a Python-based PDF library. This suggests the file was programmatically generated, which is consistent with intentional metadata manipulation rather than a standard document workflow.


4. Base64 Decoding and Flag Recovery

The anomalous Author field value was submitted to a Base64 decoder for analysis:

cGljb0NURntwdXp6bDNkX20zdGFkYXRhX2YwdW5kIV9lZTQ1NDk1MH0=

Finding: Decoding the string produced the following readable output:

picoCTF{puzzl3d_m3tadata_f0und!_ee454950}

The concealed flag was successfully recovered from the document's metadata.


Summary of Findings

StepFinding
Integrity VerificationSHA-256 hash recorded; file confirmed as valid PDF
String AnalysisNo flag-formatted strings found in visible file content
Metadata InspectionAnomalous Base64-encoded string identified in Author field
Producer AnalysisPyPDF2 identified as document generator, indicating programmatic metadata manipulation
Base64 DecodingFlag successfully recovered from decoded Author field value

Conclusion

This challenge demonstrated the use of PDF metadata fields as a concealment vector. By embedding an encoded payload within the Author field — a location invisible during normal document rendering — the flag was effectively hidden from casual inspection while remaining accessible through forensic metadata analysis.

The structured approach applied here — integrity verification, visible content analysis, and metadata examination — reflects the layered inspection methodology used in real forensic investigations when analyzing documents for concealed data, exfiltrated information, or indicators of tampering. PDF metadata manipulation is particularly relevant in real-world cases involving document fraud, insider threat investigations, and malware delivery through weaponized documents.

Made at picoCTF