Skip to content

chrbailey/agent-data-sources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

name type domain format maintained_by repo
agent-data-sources
awesome-list
ai-agent-infrastructure
markdown
chrbailey

Agent Data Sources

A curated list of structured, machine-readable data feeds designed for ingestion by AI agents, monitoring systems, and research pipelines.

Every entry has been verified to return a parseable feed. No paywalled APIs. No feeds that require authentication to read. If a feed goes dead, open an issue.

Contents


AI Trend Intelligence

Name Format Frequency Description
deeptrend JSON Feed, RSS, llms.txt Every 6 hours Structured AI trend intelligence synthesized from 14+ sources via LLM Counsel. Publishes JSON Feed, RSS, hot.json, and llms.txt. Live feed | Hot topics | RSS | llms.txt
# Quick test
curl -s https://chrbailey.github.io/deeptrend/hot.json | jq '.topics[:3]'

AI Research

Name Format Frequency Description
arXiv cs.AI RSS Daily New papers in Artificial Intelligence. Feed: https://export.arxiv.org/rss/cs.AI
arXiv cs.CL RSS Daily New papers in Computation and Language (NLP/LLMs). Feed: https://export.arxiv.org/rss/cs.CL
arXiv cs.LG RSS Daily New papers in Machine Learning. Feed: https://export.arxiv.org/rss/cs.LG
HuggingFace Daily Papers Web (HTML) Daily Community-upvoted research papers with discussion threads. Scrape or use HF API.
HuggingFace Blog RSS Weekly Technical posts on models, datasets, and tooling. Feed: https://huggingface.co/blog/feed.xml
Semantic Scholar API JSON API Real-time Academic paper search with citation graphs, abstracts, and influence scores. Free tier: 100 req/5min. Docs
# arXiv — latest AI papers
curl -s "https://export.arxiv.org/rss/cs.AI" | head -50

# Semantic Scholar — search for recent transformer papers
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?query=transformer+architecture&limit=5&fields=title,year,citationCount" | jq .

Developer News

Name Format Frequency Description
Hacker News — Front Page RSS ~1 min Top-ranked stories from HN front page. Feed: https://hnrss.org/frontpage
Hacker News — Newest RSS ~1 min All new HN submissions chronologically. Feed: https://hnrss.org/newest
Hacker News — Best Comments RSS ~1 min Highest-rated HN comments. Feed: https://hnrss.org/bestcomments
TechMeme RSS ~15 min Algorithmically curated top tech news with source clustering. Feed: https://www.techmeme.com/feed.xml
Lobsters RSS ~5 min Invite-only tech link aggregator, strong signal-to-noise. Feed: https://lobste.rs/rss
# HN front page — titles and links
curl -s "https://hnrss.org/frontpage" | grep -oP '(?<=<title>).*?(?=</title>)' | head -10

# TechMeme — current headlines
curl -s "https://www.techmeme.com/feed.xml" | grep -oP '(?<=<title>).*?(?=</title>)' | head -10

Tip: hnrss.org supports query parameters for filtering: https://hnrss.org/frontpage?q=LLM returns only stories matching "LLM". See hnrss.org for full docs.

AI Expert Commentary

Name Format Frequency Description
Simon Willison's Weblog Atom Daily Prolific commentary on LLMs, developer tools, and AI policy from the Datasette creator. Feed: https://simonwillison.net/atom/everything/
Import AI RSS Weekly Weekly newsletter on AI policy, research, and capabilities by Jack Clark (Anthropic co-founder). Feed: https://jack-clark.net/feed/
AlphaSignal RSS (Substack) Weekly Curated AI research and engineering newsletter. Feed: https://alphasignal.substack.com/feed
Last Week in AI RSS Weekly Comprehensive weekly roundup of AI news, research, and industry developments. Feed: https://lastweekin.ai/feed
Machine Learning (Substack) RSS 2-3x/week ML research highlights and analysis. Feed: https://machinelearning.substack.com/feed
# Simon Willison — latest posts
curl -s "https://simonwillison.net/atom/everything/" | grep -oP '(?<=<title>).*?(?=</title>)' | head -5

Tip: Most Substack newsletters expose an RSS feed at https://<name>.substack.com/feed. If you follow an AI author on Substack, try that pattern.

Code & Repos

Name Format Frequency Description
GitHub Trending Web (HTML) Daily Top trending repositories by language and time range. No official RSS — use the GitHub Trending API or scrape.
GitHub Events API JSON API Real-time Public events stream (pushes, stars, forks, issues). Free tier: 60 req/hr unauthenticated, 5000/hr with token.
GitLab Explore Web (HTML) Continuous Trending and most-starred public GitLab projects.
# GitHub — public events for a repo
curl -s "https://api.github.com/repos/anthropics/claude-code/events" | jq '.[0] | {type, created_at, actor: .actor.login}'

# GitHub Trending — unofficial API
curl -s "https://api.gitterapp.com/repositories?since=daily" | jq '.[0] | {name: .fullName, stars: .stars, description}'

First-Party AI Labs

Name Format Frequency Description
Google Research Blog RSS 1-2x/week Research announcements from Google Research. Feed: https://research.google/blog/rss/
Google DeepMind Blog RSS 1-2x/week Research updates from DeepMind (Gemini, AlphaFold, etc). Feed: https://deepmind.google/blog/rss.xml
BAIR Blog RSS 1-2x/month Berkeley AI Research lab posts on robotics, NLP, vision, and RL. Feed: https://bair.berkeley.edu/blog/feed.xml
HuggingFace Blog RSS Weekly Open-source model releases, training guides, and tooling updates. Feed: https://huggingface.co/blog/feed.xml
# Google Research — latest blog posts
curl -s "https://research.google/blog/rss/" | grep -oP '(?<=<title>).*?(?=</title>)' | head -5

# BAIR — latest posts
curl -s "https://bair.berkeley.edu/blog/feed.xml" | grep -oP '(?<=<title>).*?(?=</title>)' | head -5

Note on OpenAI and Anthropic: As of February 2026, neither OpenAI nor Anthropic publishes a public RSS feed for their blogs. For monitoring these, use a web-to-RSS bridge like Feedless or check community-maintained feeds.

Standards & Specs

These define the formats used by the feeds above. Useful for building parsers and validators.

Name URL Description
JSON Feed 1.1 https://www.jsonfeed.org/version/1.1/ Feed format using JSON instead of XML. Easier to parse programmatically.
RSS 2.0 Specification https://www.rssboard.org/rss-specification The dominant feed syndication format. XML-based.
Atom 1.0 (RFC 4287) https://www.rfc-editor.org/rfc/rfc4287 IETF standard for web feeds. More rigorous than RSS.
llms.txt Specification https://llmstxt.org/ Proposed standard for providing LLM-readable site information, similar to robots.txt.

Contributing

Found a feed that belongs here? Open a PR. Requirements:

  1. Machine-readable — must return structured data (RSS, Atom, JSON Feed, or JSON API), not just HTML
  2. Publicly accessible — no API keys required to read (free-tier APIs with generous limits are OK)
  3. Actively maintained — feed must have published content within the last 90 days
  4. Relevant to AI/ML practitioners — research, tooling, industry news, or infrastructure

Please include: name, URL, format, update frequency, and a one-line description.


License

CC0

To the extent possible under law, the author has waived all copyright and related rights to this work.

About

Curated directory of machine-readable data feeds for AI agents and automated pipelines

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors