gocolly / colly
Elegant Scraper and Crawler Framework for Golang
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing gocolly/colly in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Summary (README)
PreviewColly
Lightning Fast and Elegant Scraping Framework for Gophers
Colly provides a clean interface to write any kind of crawler/scraper/spider.
With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
Features
- Clean API
- Fast (>1k request/sec on a single core)
- Manages request delays and maximum concurrency per domain
- Automatic cookie and session handling
- Sync/async/parallel scraping
- Caching
- Automatic encoding of non-unicode responses
- Robots.txt support
- Distributed scraping
- Configuration via environment variables
- Extensions
Example
import (
"fmt"
"github.com/gocolly/colly/v2"
)
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("http://go-colly.org/")
}
See examples folder for more detailed examples.
Installation
go get github.com/gocolly/colly/v2
Bugs
Bugs or suggestions? Visit the issue tracker or join #colly on freenode
Other Projects Using Colly
Below is a list of public, open source projects that use Colly:
- greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive.
- altsab/gowap Wappalyzer implementation in Go.
- jesuiscamille/goquotes A quotes scraper, making your day a little better!
- jivesearch/jivesearch A search engine that doesn't track you.
- Leagify/colly-draft-prospects A scraper for future NFL Draft prospects.
- lucasepe/go-ps4 Search playstation store for your favorite PS4 games using the command line.
- yringler/inside-chassidus-scraper Scrapes Rabbi Paltiel's web site for lesson metadata.
- gamedb/gamedb A database of Steam games.
- lawzava/scrape CLI for email scraping from any website.
- eureka101v/WeiboSpiderGo A sina weibo(chinese twitter) scraper
- Go-phie/gophie Search, Download and Stream movies from your terminal
- imthaghost/goclone Clone websites to your computer within seconds.
- superiss/spidy Crawl the web and collect expired domains.
- docker-slim/docker-slim Optimize your Docker containers to make them smaller and better.
- seversky/gachifinder an agent for asynchronous scraping, parsing and writing to some storages(elasticsearch for now)
- eval-exec/goodreads crawl all tags and all pages of quotes from goodreads.
If you are using Colly in a project please send a pull request to add it to the list.
Contributors
This project exists thanks to all the people who contribute. [Contribute]. <a href="https://github.com/gocolly/colly/graphs/contributors"><img src="https://opencollective.com/colly/contributors.svg?width=890" /></a>
Backers
Thank you to all our backers! 🙏 [Become a backer]
<a href="https://opencollective.com/colly#backers" target="_blank"><img src="https://opencollective.com/colly/backers.svg?width=890"></a>
Sponsors
Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]
<a href="https://opencollective.com/colly/sponsor/0/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/0/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/1/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/1/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/2/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/2/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/3/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/3/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/4/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/4/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/5/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/5/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/6/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/6/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/7/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/7/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/8/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/8/avatar.svg"></a> <a href="https://opencollective.com/colly/sponsor/9/website" target="_blank"><img src="https://opencollective.com/colly/sponsor/9/avatar.svg"></a>