Status
Actief
Contract
Fulltime
Locatie
Veenendaal
Salaris
47.000 - 63.000
<p><strong></strong></p><p><strong>Privacy is a critical challenge in deploying Retrieval-Augmented Generation (RAG) systems in sensitive domains. This thesis investigates how privacy-preserving techniques, such as differential privacy and synthetic data, can be integrated into RAG pipelines without degrading output quality. You will analyze trade-offs, enhance a promising method, and validate your approach with a Proof of Concept focused on real-world utility and privacy guarantees.</strong></p><p></p><p><strong><strong><strong><strong>π‘</strong></strong></strong>Areas of Interest: </strong>Information retrieval, AI, data privacy, NLP, differential privacy</p><p></p><p>Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating related external knowledge into prompts. This mitigates hallucinations and improves output quality, especially when the information falls outside the modelβs original training data. However, RAG systems currently offer no guarantees that privacy-sensitive content will remain protected in their outputs, posing significant compliance and ethical risks. Consequently, such sources are often excluded from RAG applications, limiting their effectiveness in privacy-critical sectors like healthcare, legal services, finance, and government. To fully leverage RAG's potential in these domains, we need robust, scalable methods to preserve privacy without compromising performance. This thesis addresses the challenge of preserving privacy in RAG systems.</p><p></p><h2>The Assignment</h2><p></p><p>Your research will include two components:</p><ul>
<li><strong>Literature Study</strong><ul>
<li>Review state-of-the-art methods for privacy-preserving RAG.Focus areas include:</li>
<li>Differentially Private In-Context Learning (e.g., DP-ICL2)</li>
<li>Synthetic document generation (e.g., SAGE)</li>
<li>Private fine-tuning (e.g., DP-SGD, masking techniques)</li>
<li>Analyze trade-offs between privacy guarantees and model utility.</li>
</ul></li>
<li><strong>Proof of Concept (PoC)</strong><ul>
<li>Select one promising technique and enhance it.</li>
<li>Ensure your improvement addresses gaps identified in the literature.</li>
<li>Build and evaluate a PoC integrating your privacy method into a RAG pipeline.</li>
<li>Evaluation metrics:<ul>
<li><strong>Privacy:</strong> Differential Privacy parameters (Ξ΅, Ξ΄)</li>
<li><strong>Utility:</strong> Accuracy, BLEU/ROUGE scores, latency</li>
</ul></li>
</ul></li>
</ul><p><strong>Research Question</strong></p><p>You will start with the following broad research question, which you can tailor to your most promising approach later on.</p><p>"How can privacy be preserved in Retrieval-Augmented Generation systems without sacrificing model utility?"</p>
<p><strong>Materials</strong></p>
<ol>
<li>Baseline project: <a href="https://github.com/sarus-tech/dp-rag">https://github.com/sarus-tech/dp-rag</a><p>Paper: RAG with Differential Privacy <a href="https://www.arxiv.org/pdf/2412.19291">https://www.arxiv.org/pdf/2412.19291</a></p><p>Medium article: <a href="https://medium.com/sarus/introducing-dp-rag-9d4edf3f51c8">https://medium.com/sarus/introducing-dp-rag-9d4edf3f51c8</a></p></li>
<li>Paper: Privacy-Preserving In-context Learning with Differentially Private Few-shot Generation: <a href="https://arxiv.org/pdf/2309.11765">https://arxiv.org/pdf/2309.11765</a></li>
<li>Paper: Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data <a href="https://arxiv.org/pdf/2406.14773">https://arxiv.org/pdf/2406.14773</a></li>
</ol><p></p><ul>
</ul><p><strong>About Info Support</strong></p><p>Info Support specializes in custom software, data/AI solutions, management, and training and is active in the Finance, Industry, Agriculture, Food & Retail, Mobility & Public, and Healthcare sectors. We provide solid and innovative solutions for complex and critical software issues. Our headquarters are located in Veenendaal (NL) and Mechelen (BE). At present, approximately 500 employees are employed by Info Support.</p><p>Info Support's working method is characterized by a number of core values: solidity, integrity, craftsmanship, and passion. These core values are intertwined in our work and the way we interact with each other.</p><p>To ensure that all employees are always up to date with the latest developments, Info Support has an in-house knowledge center that eagerly satisfies the hunger for more or different knowledge and skills.</p><p>B2 language proficiency in Dutch is required.</p>
Gerelateerde categorieen
Ontdek meer vacatures in deze vakgebieden
Meer bij Info Support
Vergelijkbare functies binnen hetzelfde team.
Master's Thesis in Software Development: Investigating the quality of AI-generated commit Messages
Veenendaal
Engineer Data Solutions Managed Services
Veenendaal
HR Support Medewerker
Veenendaal
Afstudeeropdracht in Software Development: ShopSync - werkplanning en uitvoering voor kleine retailers (Project Mission: Impossible)
Veenendaal
Stageopdracht in Software Development: Slimmer trainen met AI β een fitnessapp die je Γ©cht vooruit helpt
Mechelen
Meer vacatures in Veenendaal
Info Support
Afstudeeropdracht in Data & AI: Kassabon++ meer dan een betalingsbewijs (Project Mission: Impossible)
Veenendaal
Info Support
HR Support Medewerker
Veenendaal
Info Support
Master's thesis in Way of Working: Harmonizing multimodal Travel Standards (Project SPITS)
Veenendaal
Info Support
Requirements Engineer
Veenendaal
Info Support
Afstudeeropdracht in Cloud: Living on the edge - de toekomst van IoT-oplossingen
Veenendaal
Info Support
Master's thesis in Software Development: Formal Verification of the authentication mechanism of a Healthcare customer (Project Ontzorg de zorg)
Veenendaal