How to Work with Your Own Files in ChatGPT

10.03.2025

When you want ChatGPT to include data from your own files in its responses (via semantic search), you have several options:

Use your own vector database and either bypass ChatGPT or use a model for vectorization
For example, you can use the OpenAI embedding models (see OpenAI Embeddings Documentation). With this approach, your system searches for the best-matching text in your vector database to answer the user's query. A demo can be found at SemanticDemo (registration with a Google account is required).
The downside is that the text can sometimes be confusing, as it may be taken out of context. Therefore, it is important to include information about where in the document the excerpt originated (such as the chapter name, page number, etc.).
Use the same vector database as above, but let ChatGPT "wrap" the answer
This approach gives you a more readable final response, because ChatGPT can rephrase the text in a more natural way. However, it can be difficult to trace exactly which document passage was used to generate the answer, making it harder to verify the source.
Upload your documents to OpenAI's vector storage
In this setup, OpenAI automatically finds the necessary information in the document, reformulates it into natural language, and then provides the final answer. The main drawback is that it becomes even harder to determine exactly where ChatGPT got its information. Even if you explicitly instruct ChatGPT to look only in the uploaded file, there is no guarantee that it follows that instruction precisely.

Each of these methods has its own advantages and trade-offs. When choosing your approach, consider whether clear traceability to the original source is important, or if the fluency and completeness of the final answer matters more.

Screenshots of the described methods – query: "How do I add an admin user to the system?"
File: Administrator's Manual RHEL 7

1/ Output Directly from the Vector Database
The system displays the page of text, calculates relevance as a percentage, provides the relevant portion of the text, and highlights it based on how closely it matches the query (the greener the highlight, the higher the relevance). It then automatically scrolls the displayed PDF to the relevant page.

2/ ChatGPT (in this case the o3-mini model) searches the vector database for the answer and "wraps" it into a readable format.

3/Output from ChatGPT, which draws from a document uploaded to the OpenAI vector database.