Page 2 | Posts tagged with “python”

Posts tagged with “python”

Building a Fish Information Fetching Web Application with Flask

In this post, I'll walk through a Python web application that fetches detailed information about fish species from FishBase, and then summarizes the data using OpenAI's GPT-3.5 model. This application is built using the Flask framework and includes web scraping and natural language processing.

Setting Up Flask

First, we import the necessary libraries and set up our Flask application:

from flask import Flask, request, jsonify, render_template
import requests
from bs4 import BeautifulSoup
from openai import OpenAI
import wikipediaapi

app = Flask(__name__)

Fetching Fish Information from FishBase

The get_fish_info function takes the species name as input, constructs the URL for FishBase, and scrapes the information using BeautifulSoup:

def get_fish_info(species):
    url = f'https://www.fishbase.se/summary/{species}'
    response = requests.get(url)
    
    if response.status_code != 200:
        return {"error": "Species not found or failed to fetch data"}
    
    soup = BeautifulSoup(response.content, 'html.parser')
    [s.extract() for s in soup(['style', 'script', '[document]', 'head', 'title'])]
    info_fishbase = soup.getText()
    
    return {
         "info": info_fishbase,
    }

Summarizing the Information with OpenAI GPT-3.5

To provide a concise summary of the fetched data, I use OpenAI's GPT-3.5. The generate_summary function interacts with the OpenAI API to generate the summary:

def generate_summary(info):
    api_key = 'YOUR_OPENAI_API_KEY'  # Replace with your actual OpenAI API key
    client = OpenAI(api_key=api_key)
    
    prompt = f"Generate a concise summary for the following fish information:\n\n{info}. The summary must contain all the important features and data about the species, based on the information given."

    completion = client.completions.create(
        model="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=150
    )
    return completion.choices[0].text.strip()

Creating Routes in Flask

I define two routes: one for the home page and another for fetching the fish information:

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/fetch', methods=['POST'])
def fetch():
    data = request.json
    species = data['species'].replace(' ', '-')
    
    fish_info = get_fish_info(species)
    if 'error' in fish_info:
        return jsonify(fish_info), 404

    summary = generate_summary(fish_info['info'])
    fish_info['summary'] = summary

    return jsonify(fish_info)

Running the Application

Finally, we run the Flask application in debug mode:

if __name__ == '__main__':
    app.run(debug=True)

From that, I build a simple web app (the whole project is available on my github ):

The length of the summary is determined by the number of tokens allowed in the model's response. In this example, the token limit caused the last sentence to be incomplete.

Open Fisheries global fish capture landings

Open Fisheries is a platform that compiles global fishery data, offering records of global fish capture landings from 1950 onwards. I have converted the great work from the rfisheries R package to Python to facilitate data analysis.

You can use the API to gather the total fish capture landings for a specific country. For example, the following chart shows the total landings for Canada:

The API can also be used to gather the total fish capture landings for different species. In this example, we look at the total landings for three species: Dentex dentex (DEC), Dentex congoensis (DNC), and Dentex macrophtalmus (DEL):

As an illustrative example, I focused on species that are assessed globally and present in France, which can be accessed here.

To better understand the conservation status of these species, I grouped them according to their IUCN status. The IUCN Red List categorizes species based on their risk of extinction, helping to guide conservation efforts. The statuses range from Least Concern to Critically Endangered, providing a critical framework for assessing biodiversity.

The IUCN red list status are:

Calling the Open Fisheries API, I was able to retrieve catch trends of species based on their conservation status, highlighting the need for targeted management and conservation strategies.

Here is a visual representation of the catch data by IUCN status:

To continue this work, I am looking for data on global fishing intensities, which would enable the calculation of Maximum Sustainable Yield (MSY) and other key statistics for more effective fisheries management. Unfortunately, I haven't been able to find this data (maybe I should look to Ray Hilborn's global fisheries database). Its availability would significantly bolster my analysis.

My Python code for this project is available on my GitHub repository. Feel free to check it out here.

Vessel classification using AIS data

As I was gaining experience in machine learning, I found myself tumbling down a rabbit hole centered around Convolutional Neural Networks (CNNs). In particular, I was game to experiment with these networks onto spatial patterns. I stumbled upon the MovingPandas library for movement data and analysis, which provided incredibly intuitive tutorials. I focused my attention to the example using AIS data published by the Danish Maritime Authority on the 5th July 2017 near Gothenburg.

It was a good start to apply some CNN. Inspired by the work of Chen et al. (2020) who train neural network to learn from labeled AIS data, I set out a CNN aiming at classifying ships by their categories, given their trajectory. For now, this is not completely functionnal since the dataset is really restricted. However the method is scalable.

First, I designed a streamlit webapp (the code is available on my github) to display vessels trajectories and densities given their category.

Streamlit

I removed categories of vessel without enough sample. I also filtered out trajectories which duration is less than half the median duration.

To train the CNN I need to compute images from the trajectories. There will be image of 128x128 pixels. I split trajectories in segments of half the median duration (~9h). I browse these segments and for each step of time, I fill the corresponding pixel (by mapping the matrix of pixels to the min/max longitude and latitude available in the data). I discard all the motionless trajectories.

Streamlit

Once I have all the images, I set up my CNN:

model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(num_classes, activation='softmax')  # Use num_classes for the output layer ])

This is a classical architecture of a CNN network, composed of an alternation of convolution layers (feature identification) and pooling layers (dimension reduction). You'll find a good summary of pooling layers here .

Then, I compile the model and fit the model :

model.compile(optimizer='adam',
   loss='categorical_crossentropy',  # Use 'categorical_crossentropy' for multi-class classification
   metrics=['accuracy'])

history = model.fit(
   train_generator,
   steps_per_epoch=train_generator.samples // train_generator.batch_size,
   epochs=10
 )

Unfortunately my dataset was not big enough to give accurate statistics. The model seems able to recognize some patterns:

output

This model may be improved by encoding acceleration and speed in the color channels. Also, I should use data augmentation techniques (rotations?) to populate the dataset.