Exploring Vector-Based Methods for Effective Document Retrieval

Authors

  • Xian Zhang Author

DOI:

https://doi.org/10.61173/077sp881

Keywords:

Vector Searching, Document Indexing, Semantic Search, LLM, Word Embedding, Information Retrieval

Abstract

With the development of online storage services and the internet, people, especially students, often collect numerous documents from the servers. Searching from abundant documents can become a difficult task. With the rapid development of the LLM (Large Language Model) and word embedding technique, people should have the chance to find the information in a new way, which is much more effective. In this paper, with the vectorization of documents and users’ questions with different models, they can somehow understand the words of users and find the correspond documents which the users want. And LLM can directly extract the important information in the documents. Therefore, users can easily find the information they want from a large number of documents.. This paper provides four methods with three main models to convert documents to vectors. The best one can retrieve more than 90% documents in the testing data set with the given keywords. 

Downloads

Published

2025-08-26

Issue

Section

Articles