Skip to content
View mahynski's full-sized avatar

Block or report mahynski

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mahynski/README.md

image

$ whoami

I am an engineer ⚙️ who specializes in designing, developing, and deploying computational tools to solve scientific problems alongside subject matter experts in a wide range of disciplines. I use data science, AI/ML, molecular simulations, and other advanced modeling tools to make data-driven discoveries in fields like material science, nuclear chemistry, food science, and biology. I have a PhD in Chemical Engineering with a concentration in computational thermodynamics and a certificate in Computational and Information Science. You can read more about ongoing research on ResearchGate.

Broad research areas include: 🔥 Thermodynamics, 💠 Material science, 🍣 Food authenticity, 〽️ Machine Learning

PyChemAuth Chemometric Carpentry STARLINGrt FINCHnmr RAG Data Extraction Auto Prompt Optimization
CD2 Escherized Colloids PyPI Template Project Template Project Release
$ quickstart

$ man -a mahynski

Developing reproducible, transparent modeling pipelines and methods requires standardized open-source tools. PyChemAuth is the main package I have developed to help chemometricians, cheminformatics professionals, and other researchers build end-to-end data science workflows from exploratory data analysis, to model optimization and comparison, to public distribution. Most data-driven projects below rely on this package. Check out the course and API Examples for more information.

:atom: Developing tools for advanced stable isotope and trace element metrology

tl;dr

Stable isotope ratios of light elements (e.g., H, C, O, N, S) and trace elemental (SITE) composition profiles are often the preferred choice of features used to model determining geographic origin of many consumer products including food. They are correlated with biogeochemical fractionation processes associated with local climate, geology, and pedology resulting in different transfer rates from natural sources (e.g., water, soil, atmosphere) to plant or animal tissues. Accurate measurements and predictive models of provenance are required to validate origin and other characteristics (organic vs. conventional farming practices) of consumer products to secure supply chains.

Products


💧 Predicting fluid phase thermodynamic properties with deep learning and coarse-grained modeling

tl;dr

The design of next-generation functional materials, central to numerous modern technologies, relies heavily on accurate thermophysical property models of chemical mixtures. Molecular-level models are required to understand their behavior and basic physics. Developing these models is computationally expensive so coarse-grained (simplified) forcefields, and predictive models with a high degree of transferrability beyond their training data, are required. "Thermodynamic extrapolation" is a method I developed at NIST to extract orders of magnitude more data and predictive capabilities from existing molecular simulations; it has since been improved and advanced by others. See NIST Accolade for details.

Products

Selected Publications


🍓 Authenticating food labeling claims with machine learning and statistical modeling

tl;dr

Food fraud refers to the deliberate substitution, addition, tampering, or misrepresentation of food with the express purpose of economic gain for the seller. This has been estimated to cost the global food industry more than $10 billion per year, although expert estimates from the US FDA put the cost as high as $40 billion per year, impacting 10% of all commercially sold food, creating a risk to public health and erosion of trust. Accurate measurements and predictive models of food provenance are required to combat this. While there are many conventional chemometric tools designed for this task, the recent resurgence of interest in machine learning algorithms, which have achieved previously unparalleled accuracy on many predictive tasks, invites the question of whether similar gains can be made in this arena. Here we build and compare state-of-the-art models for food authentication to determine the impact that AI/ML algorithms can have on field which is typically plagued by small amounts of reliable data, and require a high degree of explainability to be legally implemented.

Publications


🐦 Analyzing trends in biorepositories using explainable machine learning

tl;dr

Environmental monitoring efforts often rely on the bioaccumulation of persistent, often anthropogenic, chemical compounds in organisms to create a spatiotemporal record of ecosystems. Samples from various species are collected and cryogenically stored in biobanks to create a historical record. Compounds generally accumulate in upper trophic-level organisms due to biomagnification, reaching levels that can be detected with modern chemical instruments. However, finding proper indicators of global trends is complicated owing to the complex nature and size of many ecosystems of interest; e.g., the pacific ocean. Intercorrelation between compounds often results from the origin, uptake, and transport of these contaminants throughout the ecosystem and may be affected by organism-specific processes such as biotransformation. We developed explainable machine-learning models which perform nearly as well as state-of-the-art "black boxes" to make predictions about the environment and the organisms within it. The benefits of interpretability usually outweigh the improved accuracy of more complex models, since they help reveal rational, explainable trends that engender trust in the models and are considered more reliable.

Publications


🦠 Biomarkers and -omics applications

tl;dr

Understanding complex biochemical systems requires advanced tools, many of which have been greatly improved by advancements in artifical intelligence. Much of my background in this area involves predicting or interpreting spectral measurements, such as mass spectra or HSQC NMR. The majority of this work in ongoing and will be made available here when it is complete!

Publications


☢️ Identifying materials using non-targeted analysis methods

tl;dr

Each year less than 5% of the nearly 25 million containers arriving at US borders are selected for physical examination facilitating the import of fraudulently labelled, adulterated, and illegal substances. This fraud circumvents antidumping and countervailing duties which has cost the US government nearly $5 billion over the past 20 years and industries much more. Automated high-throughput, non-destructive general purpose scanners that can identify materials could meet this need. Prompt gamma-ray activation analysis (PGAA) is a nuclear spectroscopy technique which meets these criteria, and can provide a spectral fingerprint identifying the isotopic composition of a sample. We developed various statistical models, and CNN-based deep learning ones, illustrating that many materials can be positively identified using these spectral signals under real-world, "open set" conditions.

Publications


💠 Designing colloidal self-assembly by tiling Escher-like patterns

tl;dr

Colloidal films play a central role in technologies ranging from microelectronics to pharmaceutical delivery systems. The two-dimensional (2D) pattern of the film and its void fraction control material properties like catalytic activity, mass transfer resistance, optical properties, and hydrophobicity. Scalable production of these films relies on their self-assembly, rather than directed assembly, to make them economical and practical. Engineering colloidal self-assembly to achieve specific designs often involves tuning the shape of a colloid and creating enthalpically interacting "patches" on its surface; however, the precise connection between these factors and the final self-assembled structure is still an active area of research. We developed an approach, based on a technique known as "Escherization," to design colloids in a way that enables a priori control over the final structure's porosity and symmetry simultaneously. This is inspired by the art and mathematics behind the Dutch graphic artist M. C. Escher. Our techniques can also be used to enumerate different crystal structures and design "structure directing agents" to create arbitrary 2D patterns.

Publications

More Information

  • For an interactive experience, check out Craig Kaplan's online demo of the tiles, and modifications thereof, this theory is built on.

💬 Extractive summarization of scientific data and documents with large language models

tl;dr

Natural language processing (NLP) tools have seen incredible advances in recent years. Modern AI tools enable text extraction, document summarization, and corpus querying using natural language that provides a new avenue to interact with data. Retrieval augmented generation (RAG) is a particularly useful tool for interacting with data that has privacy concerns associated with it. RAG systems enable one to parse, query and have a "conversation" with these documents enabling one to retrieve information, create summaries and extract data. RAGs are:
  • Based on specific document(s)
  • Can cite their sources, making them more trustworthy
  • Do not require retraining or fine-tuning of an underlying large language model

With the right prompt optimization and topic modeling their performance can be increased even further for domain-specific applications.

Products


📔 Notes and HowTo are available as Gists.

$ cat /home/mahynski/.profile | more

Google Drive Ubuntu Git GitHub GitFlow GitHub Actions Visual Studio Code Docker FastAPI Badge C C++ Python Shell Script Stack Overflow CMake Markdown LaTeX Colab Run on Gradient Anaconda Jupyter Notebook scikit-learn Keras HuggingFace WandB OpenAI LlamaIndex LlamaParse Arize Phoenix Langfuse DeepChem NVIDIA-AI-Workbench NumPy SciPy Pandas Matplotlib Plotly Streamlit Bokeh Dracula DEV Profile

mahynski's github stats

Pinned Loading

  1. pychemauth pychemauth Public

    Chemometric analysis methods implemented in python

    Python 9 3

  2. chemometric-carpentry chemometric-carpentry Public

    A course in chemometric (data) carpentry.

    Jupyter Notebook 8 1

  3. usnistgov/PACCS usnistgov/PACCS Public

    Python 2 2

  4. usnistgov/escherized-colloids usnistgov/escherized-colloids Public

    C++

  5. project-template project-template Public template

    Template for new projects

    TeX

  6. Links to useful resources #notes Links to useful resources #notes
    1
    # Data Science
    2
    
                  
    3
    * [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/)
    4
    * [Google Colab](https://colab.research.google.com)
    5
    * [Data Carpentry](https://datacarpentry.org/lessons/)