Breach Data Curation Techniques

INTRO: 👋

If you're reading this, then we hope that you have already read the README.md and our disclaimer at the bottom of the document; if you have not, please do so and come back. 😅 Thank you.

The techniques discussed in this document should be concidered as the FIRST and FOREMOST steps to be taken before diving head-first into the internets as a whole. How we keep the data we discover says a lot about how we care about our personal security. The information discoverable on the internet regarding data breaches can be quite sensitive and sometimes NSFW in nature, so it is best to only take what you "need", no more, no less; as to protect yourself and those involved from any ramifications of potentially having this data leaked yourself. The data should be treated with respect, as this data was never ours to begin with, and can be used to harm an individual's online presence and identity, as well as reveal personal details regarding the people involved in the breach. How this breach data is obtained and stored both during and after this research will determine the fallout if apprehended without justified means. Research discretion is advised.

Now that we got the formalities out of the way, let's talk about data storage. The reason this is the first step before beginning this research yourself is because it will set into motion how you will perform and save your research from the start. In our practice as data researchers and security consultants, on behalf of our clients, we employ strict operational security while researching breach data. This is for two primary reasons: to protect our identities from hostile communities and threat actors, as well as to protect ourselves, clients, and company/employer as a whole. As recommended by The TOR Project themselves, a system akin to Tails OS is utilized when connecting to the internet. More details can be found on their website: https://tails.net/

💡 Please note, that while we avocate using the Tails operating system via bootable USB: https://tails.net/install/linux/index.en.html, this same privacy and security model can be attained using Kali Linux as a bootable USB with encrypted persistance: https://www.kali.org/docs/usb/usb-persistence-encryption/

SIDE-QUEST: Let's talk about hardware.

It should be noted that the type of disk you use will depend on the application as to how this research would be conducted. Example: using a 2TB hard disk drive as a bootable medium would have its advantages with storage space, but in practical usage, this may not be suitable. It is recommended to use a small-form-factor USB drive, with a minimum of 16 Gigs. Remember, this is our system to access the internet, not to store our data! 🧠
For our primary data storage device, we have several options depending on application. Ideally, we'll use a more mobile, easily concealable approach, utilizing a micro SD card, or likewise a small form-factor thumbdrive for our bootable system, only with a greater amount of storage space. This is to handle raw data as it's discovered, which may be rather large. We'll get into long-term cold-storage in a bit (no pun intended).
💡 Note: You might be wondering, why we wouldn't just keep the discovered data on the bootable drive with encrypted persistance. This is by design as to prevent single points of failure, in that if the bootable system is every lost or compromised, the data obtained from our research is stored outside this system.

MAIN-QUEST: Encrypted Storage.

Enter VeraCrypt. While LUKS is an excellent option, as well as EncFS, we highly recommend VeraCrypt for the simple reason that it is easy to use, coupled with the complexity of the encryption algorithms by default (AES, Serpent, TwoFish, etc...), the operator can also double or triple encrypt hard disks (volumes) or containers using alternating encryption algorithms. It also comes with a "plausable deniability feature" which is out of scope for this documentation, but needless to say, it helps us sleep better at night. Following the documentation, we take our primary data storage device and create a VeraCrypt volume utilizing the entire disk.

💡 Note on passwords: VeraCrypt supports several forms of authentication. Feel free to explore this feature, but whatever you do, use a password manager. We recommend KeePassXC since it stores all passwords in an encrypted database locally on disk.

TREKKING ON: Browsing and Persistent storage.

The tools mentioned above should be installed on the main bootable operating system drive utilizing encrypted persistence where we can access these tools and resources (Tor Browser, KeePassXC, VeraCrypt, etc...) to interact with our primary data storage device. Follow the application documentation on decrypting and mounting VeraCrypt volumes. If you're using Tails for the operating system, the Tor browser should already be installed and configured properly. If you're using any other operating system, such as Kali Linux mentioned above, follow the installation instructions on the Tor Project's website: https://www.torproject.org/download/

GOING DARK: Offline and Air Gapped Cold Storage

There are several ways to offload data to our encrypted storage device. Our favorite technique is a hybrid blend of air gapping a typically cold storage device while offline...💥 (Yeah, I know, that was a mouthful. Let's explain.) Basically, we have super-large (20TB or larger) spinning hard disk drive (as opposed to SSD due to its limitations in data distruction: https://www.usenix.org/legacy/events/fast11/tech/full_papers/Wei.pdf). We take the drive and create a massive encrypted VeraCrypt volume on the disk, then leave it unplugged whenever we're not using it. We recommend storing this cold storage device in a personal safe or with your company or employer. Whenever we need to offload our newly discovered breach data, we can simply locate the drive, plug it into our device running our bootable USB, confirm networking on the device is completely disabled (unplugged), then mount and migrate our new data to the hard disk.

STAYING ALIVE: Wash your filthy hands!

While we're offline on our now secure airgapped system, preferably in a safe environment to review potentially sensitive materials, we're able to review and remove all datasets we don't need. The simplest technique is to only filter for data we want using regular expressions along with our favorite scripting or programming languages. This technique should suffice in curating the exact data we need to help our clients during an approved assessment. 👀 Once we've curated a list of only our necessary data, we then destroy the original artifact using tools from the Secure-Delete suite, such as shred and/or wipe. Both can be installed on Debian based operating systems (Kali and Tails) using the command: sudo apt-get install secure-delete The shred command will wipe all forensically recoverable traces of the dataset. Be sure to run this command on the original artifact housed in your primary storage device as well if you haven't already. Once all is said and done, unmount/re-encrypt and unplug the device to return to cold storage.

CONCLUSION: The end is only the beginning. 💀

This is the part where you rinse and repeat this practice until it becomes your new norm when performing any potentially shady research. If while reading this, you've thought of several methods or techniques you could do that wasn't mentioned above, you're likely very correct and should do those things. We likely didn't mention them for a reason, but if you already practice good operational security (OpSec), then most of this should sound very familiar. If not... well then we hope you learned something. 📚 Stay safe out there, and...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breach Data Curation Techniques

INTRO: 👋

SIDE-QUEST: Let's talk about hardware.

MAIN-QUEST: Encrypted Storage.

TREKKING ON: Browsing and Persistent storage.

GOING DARK: Offline and Air Gapped Cold Storage

STAYING ALIVE: Wash your filthy hands!

CONCLUSION: The end is only the beginning. 💀

HACK THE PLANET! 💻🖧🌍

FilesExpand file tree

Breach-Data-Curation-Techniques.md

Latest commit

History

Breach-Data-Curation-Techniques.md

File metadata and controls

Breach Data Curation Techniques

INTRO: 👋

SIDE-QUEST: Let's talk about hardware.

MAIN-QUEST: Encrypted Storage.

TREKKING ON: Browsing and Persistent storage.

GOING DARK: Offline and Air Gapped Cold Storage

STAYING ALIVE: Wash your filthy hands!

CONCLUSION: The end is only the beginning. 💀

HACK THE PLANET! 💻🖧🌍