Description Tasks
responsibility
All people listed should
know how to (& have credentials to) restart and/or deploy bots
monitor bot-related rollbar notifications; check that any critical bugs are being addressed
understand error prioritization and know the failure playbook
importance (priority)
invariant fails (page @jalextowle @jrhea @mcclurejt )
checkpoint bot tries to checkpoint & fails
checkpoint bot goes down
invariant goes down
top priorities for mainnet
checkpoint bot & invariance check bot
runs
reporting system for when it goes down
secure credential management
documentation on how to (re)deploy bots
bots to consider
checkpoint
invariance check
lpandarb
this should be added after the other two are working well
documentation
uptime monitoring
easily-accessible location for cloud machine address & status
easily-accessible portal to view all deployed bot wallets
error reporting & notifications
notifications to critical team when bots go down (rollbar?)
system in place to assign responsibility for who should handle errors
easy start & restart
minimal steps to deploy new bots on a pool
ideally would be able to run out in a mainnet fork on aws instance
containerized deployment
setup flag for "service bots"
invariant checks
rollbar filters for each check type
credentials storage
privileged access to private keys for bots
whoever sets this up is fine with making calls -- lets prioritize "easy" and "safe"
ideally use a free service, but if not then fine
easiest to use env vars
lastpass credentials for pauser
continuous deployment
nice to have
when infra pushes a release we deploy bots on a mainnet fork in AWS?
almost-continuous deployment -- make it easy for a dev to manually test deployment
current status -- checkpoint bot:
running in docker container
docker can restart automatically on failure (easily set up)
passes credentials via env variables set in infra repo
registry address, rpc uri (points to anvil node), private key, rollbar api key
Reactions are currently unavailable
You can’t perform that action at this time.
Tasks
responsibility
All people listed should
importance (priority)
top priorities for mainnet
bots to consider
documentation
uptime monitoring
error reporting & notifications
easy start & restart
containerized deployment
invariant checks
credentials storage
continuous deployment
current status -- checkpoint bot: