A Firefox extension to boost my job hunt.

This post is a bit different from what I usually share.

Usually, I work with ecological data and AI, but I wanted to take a break and brush up on my JavaScript and web dev skills. So, I made a little browser extension to help with job hunting.

Job searching can get messy fast — tons of tabs, job links everywhere, and trying to match your CV to each offer. It’s easy to lose track.

That’s why I built this JobSeeker Companion: a simple Firefox extension that keeps your job search tidy and helps you spot which jobs fit your CV. No tracking, no fuss — just a handy tool to make things easier.

📋 1. Clipboard Helper

Easily copy job offer URLs as you browse. No need to dig through your history later—your links are all stored in one place.

💾 2. Save Offers

Like what you see? Save job descriptions directly from the browser and revisit them anytime.

🧠 3. Match Your CV

Paste your CV into the extension once, then check how well it matches any job offer you’re viewing. The tool highlights missing keywords to help you fine-tune your applications.

🔒 Privacy First

One important aspect I want to highlight: your data stays private. This extension does not store any of your CV or job description data anywhere. All processing happens locally on your machine, with no calls to external AI services or servers. Your information never leaves your browser, so you can use the tool with confidence and peace of mind.

⚙️ Tech used

Built as a Firefox extension using Manifest V3 and WebExtensions API with JavaScript. It injects scripts to scrape job descriptions and performs simple token-based matching against your CV. The UI uses chrome.i18n for localization.

🧪 Try It Now

The extension is available on Firefox Add-ons and on my GitHub.

🚀 What’s Next?

Future updates might include semantic similarity analysis using embeddings to improve matching accuracy and support for more job platforms — currently, the extension scrapes entire pages except on LinkedIn, where it targets only the job description panel.

If you have ideas or want to help develop these features, feel free to reach out or contribute on GitHub!

🙌 Support the Project

If this tool helps you during your job search, you can buy me a coffee ☕💖.


Marine Macrofauna Detection: A Toy Model for Hydrophone and Marked Individuals Detection

In this project, I have developed a simple toy model that simulates cetacean (or macrofauna) movement patterns and combines two detection methods: hydrophones and marked individuals. The goal is to explore how these two detection techniques can work together to improve the monitoring and conservation of marine species.

Project Overview

The model simulates a population of cetaceans moving within a defined area, with hydrophones placed strategically to detect the cetaceans. Additionally, a subset of cetaceans are marked, and their movements are tracked separately. By combining both methods, I aim to assess how well each detection technique performs and how they can complement each other.

The simulation generates interactive visualizations where you can explore the cetacean movement and detection data. The results include real-time plots of cetacean trajectories, density heatmaps, and the Mean Squared Error (MSE) between the actual and detected positions.

Features

  • Cetacean Movement Simulation: Cetaceans are simulated to move, with their movements affected by a correlation parameter.
  • Hydrophone Detection: Hydrophones are randomly placed allowing for the detection of cetaceans, with accuracy determined by their proximity to the hydrophones.
  • Marked Individuals: A subset of cetaceans is marked, and their movements are tracked separately to assess the detection accuracy for marked individuals.
  • Error Analysis: The Mean Squared Error (MSE) metric is used to compare the detected positions against the actual positions, allowing for performance evaluation.
  • Interactive Dashboard: A Streamlit web app enables interactive exploration of the simulation, where users can adjust parameters and visualize results.

You can explore the live web app here and the code is available here.

Combining (in purple) passive acoustic detection (data from hydrophones in blue) and marked individuals data (in red) may decrease the global error.

Visualization

The web app presents a visualization of the simulation results. It includes:

  • Density Heatmaps for both the marked cetaceans and those detected by hydrophones.
  • Trajectories showing the movement paths of cetaceans over time.
  • Error Metrics that visually show how accurate the detection methods are by comparing the simulated positions to the detected positions.

In the Streamlit interface, the user is presented with a set of input parameters to configure the simulation. Here's a breakdown of each parameter:

  • Number of Cetaceans (N):

    • This parameter sets the total number of cetaceans in the simulation. It is a basic parameter that defines how many animals will be simulated in the study area.
  • Marked Cetaceans (M):

    • This specifies how many of the total cetaceans are marked for tracking purposes. Marked cetaceans are used to represent the subset of the population that will be detected using hydrophones or other tracking methods.
  • Correlation Strength (correlation_strength):

    • This parameter defines the strength of the movement correlation between the cetaceans. A value closer to 1 indicates a high correlation, while a value closer to 0 means no correlation.
  • X Limit (xlim):

    • Defines the extent of the simulation area in the X direction (horizontal). It sets the maximum possible value for the X coordinate of any cetacean.
  • Y Limit (ylim):

    • Defines the extent of the simulation area in the Y direction (vertical). It sets the maximum possible value for the Y coordinate of any cetacean.
  • Steps (steps):

    • This parameter sets the number of time steps for the simulation. Each step represents a discrete time interval during which cetaceans move and may be detected.
  • Number of Hydrophones (num_hydrophones):

    • This sets the number of hydrophones used in the simulation to detect cetaceans.
  • Detection Range (detection_range):

    • Defines the maximum detection range of the hydrophones. Cetaceans within this distance of any hydrophone will be detected.

Methodology

For the density estimation, I use Kernel Density Estimation (KDE) (https://en.wikipedia.org/wiki/Kernel_density_estimation), which is a non-parametric way to estimate the probability density function of a random variable. This method works well in this case to visualize the concentration of cetaceans across the space. However, other techniques, such as Kriging or spatial interpolation methods, could also be applied for density estimation, depending on the specific needs of the simulation and the available data.

Conclusion

This project is a simple exploration of cetacean detection techniques, using a toy model to combine two commonly used methods: hydrophones and marked individuals. While the model is relatively basic, it provides valuable insights into the detection process and highlights potential challenges in real-world cetacean monitoring and conservation efforts, specially when data sources are multiple.

Feel free to explore the web app, interact with the parameters, and see how the simulation performs under different conditions. This model could be expanded to include other detection methods, species, or environmental factors.


Check out the live simulation here!


Creating a Fish-Focused Chatbot with OpenAI and GBIF API: A Step-by-Step Guide !

In a previous post, I detailed how to build a fish information-fetching web application with Flask, which you can check out here. Now, I’m excited to take it a step further and explore how to create a specialized chatbot focusing on marine science, particularly fish species and their distribution. 🐠

Chatbots have become an integral part of user interaction, providing instant answers and improving user experience. In this article, I'll explore how to create a specialized chatbot that focuses on marine science, particularly fish species and their distribution. This project leverages OpenAI's powerful language model and the Global Biodiversity Information Facility (GBIF) API to provide accurate and detailed responses.

Project Overview

The objective of the chatbot is to respond to inquiries about marine species, habitats, and behaviors while also being able to suggest follow-up questions and retrieve taxonomic data. The bot will be designed to handle specific questions, guide users to deeper knowledge, and ensure that all interactions maintain a structured format. This has been the most challenging aspect, as LLMs often struggle to retain prior knowledge."

Step 1: Setting Up the Environment

Before diving into coding, ensure you have the necessary tools installed:

  • Node.js: This will help run our server-side code.
  • OpenAI API Key: You need an account with OpenAI to access their models. Secure your API key from the OpenAI website.
  • GBIF API: This API will allow us to retrieve taxonomic data on fish species.

Step 2: Building the Chatbot

Here’s how I structure the chatbot using JavaScript:

Importing Dependencies

We will use the openai package for accessing the OpenAI API and fetch the GBIF API.

import { process } from '/env.js';
import { Configuration, OpenAIApi } from 'openai';

Configuration

Set up the OpenAI configuration with the API key.

const configuration = new Configuration({
    apiKey: process.env.OPENAI_API_KEY
});

const openai = new OpenAIApi(configuration);

Defining the Expected Response Format

We create a structure that the bot will follow for its responses:

const expectedFormat = {
    response: '',
    keywords: [],
    follow_up_questions: [],
    taxon: 'none'
};

Conversation Array

The conversation history is stored in an array, which includes a system message that outlines the bot's purpose and constraints.

const conversationArr = [
    {
        role: 'system',
        content: `You are an expert in marine science...`
    }
];

That's here that I can constrain the behavior of the bot, making it not really fun at parties:

Step 3: User Input Handling

To interact with users, we create an event listener that captures input and processes it:

document.addEventListener('submit', (e) => {
    e.preventDefault();
    const userInput = document.getElementById('user-input');   
    // Create speech bubble and add user input to conversation
    ...
    fetchReply(); // Call the function to get the bot's reply
});

Step 4: Fetching Replies from OpenAI

The fetchReply function sends the user's input to the OpenAI API, processes the response, and updates the conversation history:

async function fetchReply() {
    const response = await openai.createChatCompletion({
        model: 'gpt-4',
        messages: conversationArr,
    });
    // Process and handle the response
    ...
}

Step 5: Taxonomic Data Retrieval

To enrich responses with relevant taxonomic information, we call the GBIF API based on the taxon extracted from the chatbot's reply:

async function getTaxonKey(taxon) {
    const url = `https://api.gbif.org/v1/species/match?name=${encodeURIComponent(taxon)}`;
    ...
}

If the taxon is found, a distribution map will be plot on the left, with the corresponding API call. I’ve opted for one of the many styles available from the GBIF API, but the others are really cool too!

Step 6: Rendering Responses and Suggestions

Once the bot has generated a reply, we render it on the UI, implementing a typewriter effect for enhanced user engagement. We also provide suggestion buttons for follow-up questions:

function renderTypewriterText(text) {
    // Create a typewriter effect for the bot's reply
    ...
}

function renderSuggestions(keywords, follow_up_questions) {
    // Create suggestion buttons for follow-up questions
    ...
}

Step 7: Testing and Iteration

After implementing the chatbot, thoroughly test it with various inputs to ensure it functions as intended. Pay attention to how it handles edge cases, such as invalid queries or API errors.

Next Steps

Looking ahead, I’d like to connect the bot to the FishBase API, which would provide more quantitative data about fish species. This would make the interaction even more informative. I also think it would be great to include scientific references alongside the bot’s responses, so users can dig deeper into the research if they want to learn more.

The whole project is available on my GitHub: Chatbot Whaly.


Combining Models for Better Predictions: Stacking in Machine Learning

What is Stacking?

Stacking is an ensemble learning technique that combines the predictions of multiple base models (level 0 models) to generate a final prediction using a meta-model (level 1 model). Unlike simple voting or averaging methods, stacking uses a meta-model to learn how to best combine the predictions of base models, thereby capturing complex patterns and relationships in the data.

How Stacking Works:

  1. Base Models (Level 0 Models): These are the individual models that are trained on the same dataset. They could be of different types, such as a decision tree, a k-nearest neighbors model, or a support vector machine.

  2. Meta-Model (Level 1 Model): The predictions of the base models are used as features to train a meta-model. This model learns the optimal way to combine the base models' predictions to improve accuracy.

  3. Final Prediction: The meta-model produces the final prediction by integrating the predictions of the base models.

Why using Stacking?

  • Improved Performance: By combining multiple models, stacking can often outperform any single model. It leverages the strengths of each base model while mitigating their weaknesses.

  • Flexibility: Stacking allows you to combine different types of models, making it versatile for various datasets and problems.

  • Reduced Overfitting: The meta-model can learn to generalize better by combining the predictions of overfitted base models, leading to a more robust final model.

The main drawback of Stacking is the training time. It’s computationally expensive and time-consuming, especially for large datasets.

Practical Example: Stacking in Action - Predicting Poisonous Mushrooms on Kaggle

For this practical example, we'll walk through how I used stacking to participate in the Kaggle competition Playground Series - Season 4, Episode 8: Binary Prediction of Poisonous Mushrooms. The goal of the competition is to predict whether a mushroom is edible or poisonous based on its physical characteristics.

I'll skip the loading and pre-processing parts that you can find in my jupyter notebook.

Once the data are correctly formatted, I trained three different models as the base learners:

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

# Initialize classifiers
rf = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs = -1)
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)

# Train classifiers
rf.fit(X_TRAIN, Y_TRAIN)
gb.fit(X_TRAIN, Y_TRAIN)
knn.fit(X_TRAIN, Y_TRAIN)

# Predict on the validation set
y_pred_rf = rf.predict(X_VAL)
y_pred_gb = gb.predict(X_VAL)
y_pred_knn = knn.predict(X_VAL)

To enhance the prediction accuracy, I combined these base models using a stacking approach:

from sklearn.ensemble import StackingClassifier
# Define base learners
base_learners = [
    ('rf', rf),
    ('gb', gb),
    ('knn', knn)
]

# Define meta-learner
meta_learner = LogisticRegression()

# Initialize Stacking Classifier
stacking_clf = StackingClassifier(estimators=base_learners, final_estimator=meta_learner)

# Train Stacking Classifier
stacking_clf.fit(X_TRAIN, y_train)

# Predict on validation set
y_pred_stacking = stacking_clf.predict(X_VAL)

Finally, I evaluated the performance of each base model and the stacked model on the validation set to see the benefits of stacking:

from sklearn.metrics import matthews_corrcoef

# Calculate mcc for each model
mcc = {
    'Random Forest': matthews_corrcoef(Y_VAL, y_pred_rf),
    'Gradient Boosting': matthews_corrcoef(Y_VAL, y_pred_gb),
    'KNN': matthews_corrcoef(Y_VAL, y_pred_knn),
    'Stacking': matthews_corrcoef(Y_VAL, y_pred_stacking)
}


# Sort MCC values for better visualization
sorted_mcc = dict(sorted(mcc.items(), key=lambda item: item[1]))

# Plot the MCCs
plt.figure(figsize=(10, 6))
bars = plt.barh(list(sorted_mcc.keys()), list(sorted_mcc.values()), color=['#3498db', '#2ecc71', '#e74c3c', '#9b59b6'])

# Add MCC values to the bars
for bar in bars:
    plt.text(bar.get_width() + 0.01, bar.get_y() + bar.get_height()/2, 
             f'{bar.get_width():.3f}', va='center', fontsize=12)

plt.xlabel('Matthews Correlation Coefficient (MCC)', fontsize=14)
plt.title('Comparison of Base Models and Stacking Model', fontsize=16)
plt.xlim([0., 1.06])
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

In this Kaggle competition the performance is evaluated using Matthews Correlation Coefficient (MCC). It is a metric for binary classification that takes into account true and false positives and negatives, providing a balanced measure even when the classes are imbalanced.

Here, the stacking model slightly outperformed the individual base models. While the improvement may seem marginal, in high-stakes scenarios, even small gains in performance can be critical.


Visualizing Fisheries Data with Sankey Diagrams

Sankey diagrams are an excellent tool for visualizing the flow of quantities between categories. In the context of fisheries data, they help illustrate how different fishing methods contribute to the harvest of various species. This guide will show you how to use a Python script to generate code for Sankey diagrams in different formats.

Overview

The Python script (available on my github) can read a CSV file and generate code snippets for Sankey diagrams in the following formats:

  • SankeyMATIC: A web-based tool for creating Sankey diagrams.
  • Python (using Plotly): An interactive plotting library for Python.
  • R (using networkD3): An R package for interactive network diagrams.

How It Works

1. Prepare Your CSV File

Your CSV file should include these columns:

  • Source: The origin of the flow (e.g., fishing method).
  • Target: The destination of the flow (e.g., species caught).
  • Value: The quantity of the flow (e.g., weight of fish).

Example CSV File:

Engine,Species,Weight
Trawler,Cod,500
Longline,Cod,200
Trawler,Haddock,300
Longline,Haddock,100
...

2. Using the Python Script

The Python script processes your CSV file and generates code snippets for various Sankey diagram formats. You can specify the desired output format using the --output option:

  • sankeymatic: Generates code compatible with the SankeyMATIC web tool.
  • python: Produces code for creating interactive Sankey diagrams in Python using Plotly.
  • r (networkD3): Creates code for generating Sankey diagrams in R using the networkD3 package.
  • all: Outputs code snippets for all the above formats.

Example:

  python sankey_formatter_all.py.py data.csv --output sankeymatic

Python and R output are pretty basic and may be enhanced by playing with the different options.

By copy/pasting the output in SankeyMATIC, it's easier to modify the output Sankey diagram as you need.