WORKSHOP – Web Scraping: Make the internet your playground

Name: WORKSHOP – Web Scraping: Make the internet your playground
Start: 2024-08-19T16:00:00+02:00
End: 2024-08-19T18:00:00+02:00
Location: datacraft –

Inscription

Organizers

Raphael Vienne, Head of AI at datacraft
Rémy Gasmi, Data Scientist Intern at datacraft

Workshop introduction:

Scraping has been more and more recognized since LLMs became a thing, as these models rely on several petabytes of internet data for pre-training, that were extracted from web crawlers.

Every year, the internet produces tons of extremely valuable data. Some individuals might be interested in either collecting relevant data from the internet automatically, or even automate some actions online.

Both of these considerations can be done with scraping.

In this workshop, we will try to introduce participants to scraping, as well as discussing legal considerations regarding this practice.

Workshop summary:

In this workshop, we will:

Introduce scraping libraries as well as legal considerations regarding scraping (when not to scrape).
Start scraping on a simple example (extracting information from a wiki).
Carry out a more complex scraping pipeline (scrape datacraft agenda and incoming events).
Finally, let participants build their own scraping project (on the website of their choice).

If you thought of automating an online task once in a while, or if you’re just curious about scraping with python, this workshop is for you!

Come and benefit from the experience of our team on this domain.

< All past workshops