Archive of 2024 | Antoine's Blog

September 2024

Creating a Fish-Focused Chatbot with OpenAI and GBIF API: A Step-by-Step Guide ! In a previous post, I detailed how to build a fish information-fetching web application with Flask, which you can check out here. Now, I’m excited to take it a step further and explore how to create a specialized chatbot focusing on marine science, particularly fish species and their distribution. 🐠 Chatbots have become an integral part of user interaction, providing instant answers and improving user experience. In this article, I'll explore how to create a specialized chatbot that focuses on marine science, particularly fish species and their distribution. This project leverages OpenAI's powerful language model and the Global Biodiversity Information Facility (GBIF) API to provide accurate and detailed responses. Project Overview The objective of the chatbot is to respond to inquiries about marine species, habitats, and behaviors while also being able to suggest follow-up questions and retrieve…

Permanent link to “Creating a Fish-Focused Chatbot with OpenAI and GBIF API: A Step-by-Step…”

August 2024

Combining Models…

What is Stacking? Stacking is an ensemble learning technique that combines the predictions of multiple base models (level 0 models) to generate a final prediction using a meta-model (level 1 model). Unlike simple voting or averaging methods, stacking uses a meta-model to learn how to best combine the predictions of base models, thereby capturing complex patterns and relationships in the data. How Stacking Works: Base Models (Level 0 Models): These are the individual models that are trained on the same dataset. They could be of different types, such as a decision tree, a k-nearest neighbors model, or a support vector machine. Meta-Model (Level 1 Model): The predictions of the base models are used as features to train a meta-model. This model learns the optimal way to combine the base models' predictions to improve accuracy. Final Prediction: The meta-model produces the final prediction by integrating the predictions of the base models. Why using Stacking? Improved Performance: By…

Permanent link to “Combining Models for Better Predictions: Stacking in Machine Learning”
Visualizing…

Sankey diagrams are an excellent tool for visualizing the flow of quantities between categories. In the context of fisheries data, they help illustrate how different fishing methods contribute to the harvest of various species. This guide will show you how to use a Python script to generate code for Sankey diagrams in different formats. Overview The Python script (available on my github) can read a CSV file and generate code snippets for Sankey diagrams in the following formats: SankeyMATIC: A web-based tool for creating Sankey diagrams. Python (using Plotly): An interactive plotting library for Python. R (using networkD3): An R package for interactive network diagrams. How It Works 1. Prepare Your CSV File Your CSV file should include these columns: Source: The origin of the flow (e.g., fishing method). Target: The destination of the flow (e.g., species caught). Value: The quantity of the flow (e.g., weight of fish). Example CSV File: Engine,Species,Weight Trawler,Cod,500…

Permanent link to “Visualizing Fisheries Data with Sankey Diagrams”
Comparing Random…

In this post, I'll dive into a comparison of two popular machine learning models: Random Forest and Boosted Trees (XGBoost). I will use a dataset from a study on cockle densities in relation to green macroalgal (GMA) biomass in Yaquina Bay, Oregon. By analyzing their performance on this dataset, we'll explore which model is better suited for this type of ecological data. Credit: Yakfish Taco Introduction to Random Forest and Boosted Trees Random Forest and Boosted Trees (XGBoost) are two ensemble learning methods widely used in machine learning for classification and regression tasks. Random Forest operates by constructing a multitude of decision trees during training and outputs the class that is the majority vote of the individual trees. This approach helps in reducing overfitting and improving model accuracy by averaging the predictions from multiple trees. On the other hand, Boosted Trees, particularly XGBoost, enhance predictive performance through a technique called boosting.…

Permanent link to “Comparing Random Forest and Boosted Trees: An Analysis Using Cockle Field Survey Data”

July 2024

Optimizing Circle…

In various scientific and engineering applications, there is a need to optimally arrange objects within a given space. One fascinating instance is the problem of placing multiple circles within a rectangular area to maximize the covered area without overlapping. This task is highly relevant in experiment design. In this post, I explore a Python-based approach using the Pyomo optimization library to solve this problem. Background The circle placement problem can be categorized as a type of packing problem, which is a well-known challenge in operations research and combinatorial optimization. The primary objective is to arrange a set of circles within a bounded area such that the total covered area is maximized and no circles overlap. Why This Is Useful Optimal placement of objects is crucial in many fields: Experiment Design: Efficient use of space can lead to better experiment setups and resource utilization. Manufacturing: In industries, optimizing the layout of components can…

Permanent link to “Optimizing Circle Placement in a Defined Area: A Pyomo-Based Approach”

June 2024

Building a Fish…

In this post, I'll walk through a Python web application that fetches detailed information about fish species from FishBase, and then summarizes the data using OpenAI's GPT-3.5 model. This application is built using the Flask framework and includes web scraping and natural language processing. Setting Up Flask First, we import the necessary libraries and set up our Flask application: from flask import Flask, request, jsonify, render_template import requests from bs4 import BeautifulSoup from openai import OpenAI import wikipediaapi app = Flask(__name__) Fetching Fish Information from FishBase The get_fish_info function takes the species name as input, constructs the URL for FishBase, and scrapes the information using BeautifulSoup: def get_fish_info(species): url = f'https://www.fishbase.se/summary/{species}' response = requests.get(url) if response.status_code != 200: return {"error": "Species not found or failed to fetch data"} soup = BeautifulSoup(response.content, 'html.parser')…

Permanent link to “Building a Fish Information Fetching Web Application with Flask”
Open Fisheries…

Open Fisheries is a platform that compiles global fishery data, offering records of global fish capture landings from 1950 onwards. I have converted the great work from the rfisheries R package to Python to facilitate data analysis. You can use the API to gather the total fish capture landings for a specific country. For example, the following chart shows the total landings for Canada: The API can also be used to gather the total fish capture landings for different species. In this example, we look at the total landings for three species: Dentex dentex (DEC), Dentex congoensis (DNC), and Dentex macrophtalmus (DEL): As an illustrative example, I focused on species that are assessed globally and present in France, which can be accessed here. To better understand the conservation status of these species, I grouped them according to their IUCN status. The IUCN Red List categorizes species based on their risk of extinction, helping to guide conservation efforts. The statuses range from…

Permanent link to “Open Fisheries global fish capture landings”

April 2024

Vessel…

As I was gaining experience in machine learning, I found myself tumbling down a rabbit hole centered around Convolutional Neural Networks (CNNs). In particular, I was game to experiment with these networks onto spatial patterns. I stumbled upon the MovingPandas library for movement data and analysis, which provided incredibly intuitive tutorials. I focused my attention to the example using AIS data published by the Danish Maritime Authority on the 5th July 2017 near Gothenburg. It was a good start to apply some CNN. Inspired by the work of Chen et al. (2020) who train neural network to learn from labeled AIS data, I set out a CNN aiming at classifying ships by their categories, given their trajectory. For now, this is not completely functionnal since the dataset is really restricted. However the method is scalable. First, I designed a streamlit webapp (the code is available on my github) to display vessels trajectories and densities given their category. I removed categories of vessel…

Permanent link to “Vessel classification using AIS data”