Skip to content
View shamikbose's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report shamikbose

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shamikbose/README.md

Hi there πŸ‘‹

I'm Shamik and I enjoy building solutions to problems, mostly through programming (and occasionally with WD-40). I work as a Lead Data Scientist building machine learning applications for detecting and anonymizing PII and PHI in data breaches. I am also a part-time contributor to the BigScience Workshop, the BigBIO effort and the BigCode Project from πŸ€—. In addition, I am working with PIISA, a collection of data scientists, software developers and lawyers to establish an open standard for PII protection that can be used across the globe. You can follow our efforts here. I also like to cook πŸ‘¨β€πŸ³

β”œβ”€β”€ Interests
β”‚   β”œβ”€β”€ Natural Language Processing
β”‚   β”œβ”€β”€ Explainable Machine Learning
β”‚   β”œβ”€β”€ AI Ethics
β”‚   β”œβ”€β”€ System Design
β”‚   └── PII Anonymization
β”œβ”€β”€ Occupations
β”‚   β”œβ”€β”€ Software Engineer
β”‚   β”œβ”€β”€ Graduate Research Assistant
β”‚   β”œβ”€β”€ Lead Data Scientist
β”‚   └── Senior Researcher
β”œβ”€β”€ Locations
β”‚   β”œβ”€β”€ Kolkata, India
β”‚   β”œβ”€β”€ Boston, MA, USA
β”‚   β”œβ”€β”€ Tallahassee, FL, USA
β”‚   └── Leeds, England
└── Book Suggestions
    β”œβ”€β”€ Fiction
    β”‚   β”œβ”€β”€ The Three Body Problem - Cixin Liu
    β”‚   β”œβ”€β”€ All the Light we cannot see - Anthony Doerr
    β”‚   └── Purple Hibiscus - Chimamanda Ngozi Adichie
    β”œβ”€β”€ Non-Fiction
    β”‚   β”œβ”€β”€ Algorithms of Oppression - Safiya Umoji Noble
    β”‚   β”œβ”€β”€ Braiding Sweetgrass - Robin Wall Kimmerer
    |   β”œβ”€β”€ Chaos Machine - Max Fisher
    |   β”œβ”€β”€ Viral Justice - Ruha Benjamin
    β”‚   └── Weapons of Math Destruction - Cathy O. Neill
    └── Cookbooks
        β”œβ”€β”€ The Food Lab - J. Kenji Lopez-Alt
        β”œβ”€β”€ Mi Cocina - Rick Martinez
        └── Dessert Person - Claire Saffitz
Projects
  1. Scientific Title Generator
  2. BigBIO dataloaders
  3. MIT 6.006 Solution Notebooks
Publications
  1. Explaining AI for Malware Detection: Analysis of Mechanisms of MalConv
  2. PhD Thesis: Towards Explainability in Machine Learning for Malware Detection
  3. Static Malware Modeling and Detection using Topic Models
  4. BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing
  5. The bigscience roots corpus: A 1.6 tb composite multilingual dataset

P.S. The tree was built using Rich

Pinned Loading

  1. MIT6.006 MIT6.006 Public

    This contains solutions to problems discussed in the lectures for the "Intro to Algorithms" course. Video playlist for the course is available here: https://www.youtube.com/playlist?list=PLUl4u3cNG…

    Jupyter Notebook 18 3

  2. bigscience-workshop/biomedical bigscience-workshop/biomedical Public

    Tools for curating biomedical training data for large-scale language modeling

    Python 462 116

  3. stellar-finetuning stellar-finetuning Public

    This repository contains tutorials about finetuning pretrained Pytorch models

    Jupyter Notebook 9 4

  4. leetcode_top leetcode_top Public

    A repo showcasing solutions to the top interview Questions on Leetcode

    Python 1

  5. mlfromscratch mlfromscratch Public

    A repository of Machine Learning algorithms from scratch

    Python 1

  6. pii-lib pii-lib Public

    Forked from bigcode-project/pii-lib

    Code for PII detection and redaction in code datasets

    Python