Logo Info Support
Actief ongeveer 7 uur geleden geplaatst

Master's Thesis in Data & AI: Privacy preserving RAG

Info Support
Veenendaal 47.000 - 63.000 Fulltime Privacy Data Afstudeerstage

Status

Actief

Contract

Fulltime

Locatie

Veenendaal

Salaris

47.000 - 63.000

Privacy Data Afstudeerstage

<p><strong></strong></p><p><strong>Privacy is a critical challenge in deploying Retrieval-Augmented Generation (RAG) systems in sensitive domains. This thesis investigates how privacy-preserving techniques, such as differential privacy and synthetic data, can be integrated into RAG pipelines without degrading output quality. You will analyze trade-offs, enhance a promising method, and validate your approach with a Proof of Concept focused on real-world utility and privacy guarantees.</strong></p><p></p><p><strong><strong><strong><strong>πŸ’‘</strong></strong></strong>Areas of Interest:&nbsp;</strong>Information retrieval, AI, data privacy, NLP, differential privacy</p><p></p><p>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating related external knowledge into prompts. This mitigates hallucinations and improves output quality, especially when the information falls outside the model’s original training data. However, RAG systems currently offer no guarantees that privacy-sensitive content will remain protected in their outputs, posing significant compliance and ethical risks. Consequently, such sources are often excluded from RAG applications, limiting their effectiveness in privacy-critical sectors like healthcare, legal services, finance, and government. To fully leverage RAG's potential in these domains, we need robust, scalable methods to preserve privacy without compromising performance. This thesis addresses the challenge of preserving privacy in RAG systems.</p><p></p><h2>The Assignment</h2><p></p><p>Your research will include two components:</p><ul>
<li><strong>Literature Study</strong><ul>
<li>Review state-of-the-art methods for privacy-preserving RAG.Focus areas include:</li>
<li>Differentially Private In-Context Learning (e.g., DP-ICL2)</li>
<li>Synthetic document generation (e.g., SAGE)</li>
<li>Private fine-tuning (e.g., DP-SGD, masking techniques)</li>
<li>Analyze trade-offs between privacy guarantees and model utility.</li>
</ul></li>
<li><strong>Proof of Concept (PoC)</strong><ul>
<li>Select one promising technique and enhance it.</li>
<li>Ensure your improvement addresses gaps identified in the literature.</li>
<li>Build and evaluate a PoC integrating your privacy method into a RAG pipeline.</li>
<li>Evaluation metrics:<ul>
<li><strong>Privacy:</strong> Differential Privacy parameters (Ξ΅, Ξ΄)</li>
<li><strong>Utility:</strong> Accuracy, BLEU/ROUGE scores, latency</li>
</ul></li>
</ul></li>
</ul><p><strong>Research Question</strong></p><p>You will start with the following broad research question, which you can tailor to your most promising approach later on.</p><p>"How can privacy be preserved in Retrieval-Augmented Generation systems without sacrificing model utility?"</p>

<p><strong>Materials</strong></p>
<ol>
<li>Baseline project: <a href="https://github.com/sarus-tech/dp-rag">https://github.com/sarus-tech/dp-rag</a><p>Paper: RAG with Differential Privacy <a href="https://www.arxiv.org/pdf/2412.19291">https://www.arxiv.org/pdf/2412.19291</a></p><p>Medium article: <a href="https://medium.com/sarus/introducing-dp-rag-9d4edf3f51c8">https://medium.com/sarus/introducing-dp-rag-9d4edf3f51c8</a></p></li>
<li>Paper: Privacy-Preserving In-context Learning with Differentially Private Few-shot Generation: <a href="https://arxiv.org/pdf/2309.11765">https://arxiv.org/pdf/2309.11765</a></li>
<li>Paper: Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data <a href="https://arxiv.org/pdf/2406.14773">https://arxiv.org/pdf/2406.14773</a></li>
</ol><p></p><ul>
</ul><p><strong>About Info Support</strong></p><p>Info Support specializes in custom software, data/AI solutions, management, and training and is active in the Finance, Industry, Agriculture, Food &amp; Retail, Mobility &amp; Public, and Healthcare sectors. We provide solid and innovative solutions for complex and critical software issues. Our headquarters are located in Veenendaal (NL) and Mechelen (BE). At present, approximately 500 employees are employed by Info Support.</p><p>Info Support's working method is characterized by a number of core values: solidity, integrity, craftsmanship, and passion. These core values are intertwined in our work and the way we interact with each other.</p><p>To ensure that all employees are always up to date with the latest developments, Info Support has an in-house knowledge center that eagerly satisfies the hunger for more or different knowledge and skills.</p><p>B2 language proficiency in Dutch is required.</p>

Master's Thesis in Data & AI: Privacy preserving RAG
Info Support
Aanmelden & solliciteren