Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/app updates #5056

Closed
wants to merge 44 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
4f75eea
config updates
aginai Nov 11, 2023
165860b
config updates
aginai Nov 11, 2023
a1d7dfc
navigation config updates
aginai Nov 11, 2023
5fb6b00
beta testing
aginai Dec 10, 2023
f3b4472
address fix
aginai Dec 10, 2023
799b389
beta testing
aginai Dec 10, 2023
7eae802
tweak look and feel
aginai Dec 10, 2023
66b6e7c
Merge branch 'master' of https://github.com/aginai/aginai.github.io
aginai Dec 10, 2023
b0a9714
navigation removal
aginai Dec 10, 2023
9e467db
indent fix
aginai Dec 10, 2023
a6a30fe
fix site
aginai Dec 10, 2023
58cd799
remove image req
aginai Dec 10, 2023
9ff668c
remove all changes
aginai Dec 10, 2023
a5acc17
revert
aginai Dec 10, 2023
53d1588
restore default
aginai Dec 10, 2023
18528b9
some changes
aginai Dec 10, 2023
b5e8b83
cleanup
aginai Dec 10, 2023
334de46
updating posts
aginai Dec 11, 2023
a46af6c
trial
aginai Dec 11, 2023
987dc6c
trial2
aginai Dec 11, 2023
5466b1c
udpate pdf
aginai Dec 11, 2023
db39c9d
test
aginai Dec 11, 2023
51383a5
attachment
aginai Dec 11, 2023
fdc13ef
path
aginai Dec 11, 2023
9f466f0
cleanup
aginai Jan 1, 2024
5c56a66
update link
aginai Jan 2, 2024
359bebd
update link to paper
aginai Jan 2, 2024
c1fdd64
paper loc update
aginai Jan 2, 2024
31b2fa4
pdf loc
aginai Jan 2, 2024
6948b59
update about page
aginai Jan 2, 2024
12fe81e
update navigation to about
aginai Jan 2, 2024
6288448
test nav
aginai Jan 2, 2024
3be83f6
update nav file
aginai Jan 2, 2024
87c4e56
update about
aginai Jan 2, 2024
8de4392
about update
aginai Jan 3, 2024
5e762db
pdf update
aginai Jan 3, 2024
2af3cdf
img
aginai Jan 3, 2024
6823006
update pdf
aginai May 18, 2024
805ea10
add profile pic
aginai May 18, 2024
608d173
adding image to abt
aginai May 18, 2024
04c7942
update img
aginai May 18, 2024
279c6bc
update img
aginai May 18, 2024
a416ab2
Update 2024-10-08-genai-chatbot.md
aginai Oct 8, 2024
12bb8c7
update new posts
Dec 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
274 changes: 0 additions & 274 deletions README.md

This file was deleted.

21 changes: 11 additions & 10 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@

# theme : "minimal-mistakes-jekyll"
# remote_theme : "mmistakes/minimal-mistakes"
minimal_mistakes_skin : "default" # "air", "aqua", "contrast", "dark", "dirt", "neon", "mint", "plum", "sunrise"
minimal_mistakes_skin : "dark" # "air", "aqua", "contrast", "dark", "dirt", "neon", "mint", "plum", "sunrise"

# Site Settings
locale : "en-US"
title : "Site Title"
title : "Home"
title_separator : "-"
subtitle : # site tagline that appears below site title in masthead
name : "Your Name"
description : "An amazing website."
name : "Abhijeet Ghawade"
description : "Data Analyst at SLB"
url : # the base hostname & protocol for your site e.g. "https://mmistakes.github.io"
baseurl : # the subpath of your site, e.g. "/blog"
repository : # GitHub username/repo-name e.g. "mmistakes/minimal-mistakes"
Expand Down Expand Up @@ -104,18 +104,18 @@ analytics:

# Site Author
author:
name : "Your Name"
name : "Abhijeet Ghawade"
avatar : # path of avatar image, e.g. "/assets/images/bio-photo.jpg"
bio : "I am an **amazing** person."
location : "Somewhere"
bio : "Data Analyst at SLB"
location : "Pune, India"
email :
links:
- label: "Email"
icon: "fas fa-fw fa-envelope-square"
# url: "mailto:your.name@email.com"
url: "mailto:abhijeet.ghawade12@gmail.com"
- label: "Website"
icon: "fas fa-fw fa-link"
# url: "https://your-website.com"
url: "https://aginai.github.io"
- label: "Twitter"
icon: "fab fa-fw fa-twitter-square"
# url: "https://twitter.com/"
Expand All @@ -124,7 +124,7 @@ author:
# url: "https://facebook.com/"
- label: "GitHub"
icon: "fab fa-fw fa-github"
# url: "https://github.com/"
url: "https://github.com/abhijeetg12"
- label: "Instagram"
icon: "fab fa-fw fa-instagram"
# url: "https://instagram.com/"
Expand Down Expand Up @@ -156,6 +156,7 @@ footer:
include:
- .htaccess
- _pages
- _posts
exclude:
- "*.sublime-project"
- "*.sublime-workspace"
Expand Down
14 changes: 7 additions & 7 deletions _data/navigation.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# main links
main:
- title: "Quick-Start Guide"
url: https://mmistakes.github.io/minimal-mistakes/docs/quick-start-guide/
# - title: "About"
# url: https://mmistakes.github.io/minimal-mistakes/about/
# - title: "Sample Posts"
# url: /year-archive/
- title: "Get to know me"
url: /about/
- title: "Personal"
url: /about/
- title: "Blogs"
url: /posts/
# - title: "Sample Collections"
# url: /collection-archive/
# - title: "Sitemap"
# url: /sitemap/
# url: /sitemap/
26 changes: 26 additions & 0 deletions _pages/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: About Me
permalink: /about/
image : prof_pic.jpeg
---

<!-- # About Me

Hello, I'm [Your Name]. Welcome to my personal space on the internet.


## Contact

Feel free to reach out to me on [social media](#) or via email at [[email protected]]. -->

## About me
![prof_pic](/../assets/splash/cover.jpg)
Hello!
thanks for visiting my personal website!
I am Abhijeet, working as a Data Analyst at SLB (Schlumberger) working primarily on Data Analysis.

Prior to joining SLB, I graduated from Indian Institute of Technology, Madras with a Bachelors and Masters in Electrical engineering.
I worked on my final semester project with professor Dr. Balaram Ravindran on option discovery methods for reinforcement learning.
I spend a semester at the Carnegie Mellon University at the Advanced Agents Robotics Technology Lab in 2019-20 for an undergraduate internship.
My areas of interest are Deep learning, Large Language Models and Reinforcement Learning.

6 changes: 6 additions & 0 deletions _pages/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
title: SPE paper
nav: true
nav_order: 4
cv_pdf: spe-205877.pdf

---
1 change: 1 addition & 0 deletions _pages/projects.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
## Project
25 changes: 25 additions & 0 deletions _posts/2023-12-11-spe-paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
title: " Research Publication"
last_modified_at: 2018-03-20T16:00:58-04:00
categories:
- Jekyll
tags:
- update
toc: true
toc_label: "Getting Started"
---
## Cognitive HSE Risk Prediction and Notification Tool Based on Natural Language Processing
## Abstract
The focus of this work is on developing a cognitive tool that predicts the most frequent HSE hazards with
the highest potential severity levels. The tool identifies these risks using a natural language processing
algorithm on HSE leading and lagging indicator reports submitted to an oilfield services company’s global
HSE reporting system. The purpose of the tool is to prioritize proactive actions and provide focus to raise
workforce awareness.
A natural language processing algorithm was developed to identify priority HSE risks based on
potential severity levels and frequency of occurrence. The algorithm uses vectorization, compression, and
clustering methods to categorize the risks by potential severity and frequency using a formulated risk index
methodology. In the pilot study, a user interface was developed to configure the frequency and the number
of the prioritized HSE risks that are to be communicated from the tool to those employees who opted to
receive the information in a given location.

<a href="/assets/_attachments/spe-205877-ms.pdf" target="_blank"> Link to the paper </a>
136 changes: 136 additions & 0 deletions _posts/2024-10-08-genai-chatbot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Building a RAG Search System with Azure AI and Streamlit

In this post, I'll walk you through how I built a Retrieval-Augmented Generation (RAG) search system using Azure AI services and Streamlit. This project combines the power of Azure Cognitive Search with Large Language Models to create an intelligent chatbot that can answer questions based on your own knowledge base.

## What is RAG?

Before diving into the implementation details, let's understand what RAG is. Retrieval-Augmented Generation is a technique that enhances Large Language Models by allowing them to access and use external knowledge. Instead of relying solely on the model's trained knowledge, RAG systems first retrieve relevant information from a curated knowledge base and then use that information to generate more accurate and contextual responses.

## System Architecture

The system consists of several key components:

1. **Document Processing Pipeline**: Handles PDF documents by:
- Splitting them into pages
- Extracting text (including tables)
- Chunking content into manageable sections
- Uploading to Azure Blob Storage

2. **Azure Cognitive Search**: Indexes and stores the processed documents for efficient retrieval

3. **Streamlit Interface**: Provides a user-friendly chat interface with:
- Persona selection
- Conversation history
- Dynamic response generation

## Key Implementation Features

### Document Processing

One of the most interesting aspects of this project is how it handles document processing. The system can process PDF documents intelligently:

```python
def get_document_text(filename):
if localpdfparser:
reader = PdfReader(filename)
pages = reader.pages
for page_num, p in enumerate(pages):
page_text = p.extract_text()
page_map.append((page_num, offset, page_text))
else:
form_recognizer_client = DocumentAnalysisClient(...)
# Use Azure Form Recognizer for advanced document analysis
```

The system can even handle complex tables by converting them to HTML format:

```python
def table_to_html(table):
table_html = "<table>"
rows = [sorted([cell for cell in table.cells if cell.row_index == i],
key=lambda cell: cell.column_index)
for i in range(table.row_count)]
# Process table structure...
return table_html
```

### Smart Text Chunking

To optimize search performance, the system implements intelligent text chunking:

1. Maintains context across chunks with overlapping sections
2. Respects sentence boundaries
3. Handles special cases like tables spanning multiple chunks

### Azure Search Integration

The search index is designed to support semantic search capabilities:

```python
search_index = SearchIndex(
name=index,
fields=[
SimpleField(name="id", type="Edm.String", key=True),
SearchableField(name="content", type="Edm.String", analyzer_name="en.microsoft"),
SimpleField(name="category", type="Edm.String", filterable=True, facetable=True),
# Additional fields...
],
semantic_settings=SemanticSettings(...)
)
```

## The User Experience

The Streamlit interface provides a clean, chat-like experience where users can:

1. Select different personas (e.g., TLM Manager, PSD Manager)
2. Ask questions naturally
3. See conversation history
4. Get responses with source references

## Deployment and Configuration

The system is designed to be easily deployable with minimal configuration:

1. Configure Azure services in `secrets.toml`:
```toml
[default]
searchservice = "your-search-service-name"
searchkey = "your-search-service-admin-api-key"
index = "your-index-name"
```

2. Run with a simple command:
```bash
streamlit run app.py
```

## Technical Considerations

When building this system, I had to address several technical challenges:

1. **Document Processing**: Handling various PDF formats and table structures
2. **Chunking Strategy**: Balancing chunk size with context preservation
3. **Search Relevance**: Tuning search parameters for optimal results
4. **Response Generation**: Creating contextual prompts for the LLM

## Future Improvements

Some potential enhancements I'm considering:

1. Multi-language support
2. Advanced document preprocessing options
3. Custom embeddings for improved search relevance
4. Real-time document updates

## Conclusion

This RAG search system demonstrates how to combine Azure AI services with modern UI frameworks to create a powerful, user-friendly search experience. The modular architecture makes it easy to extend and customize for different use cases.

Feel free to check out the [GitHub repository](https://github.com/your-username/rag-search-azure-ai) for the complete code and documentation.

## Resources

- [Azure Cognitive Search Documentation](https://docs.microsoft.com/en-us/azure/search/)
- [Streamlit Documentation](https://docs.streamlit.io/)
- [Python PDF Processing](https://pypdf.readthedocs.io/)
Loading
Loading