Site Reliability Engineer / SRE
Oqton is an international fast-growing scale-up with offices in San Francisco, Atlanta, Ghent, Copenhagen and Shanghai and the Engineering branch of 3DSystems. Our AI-powered software platform allows manufacturing facilities to fully automate, integrate and monitor their machines and workflows.
To achieve our ambitious goals, we are looking for an experienced Site Reliability Engineer to join the team.
If this sounds exciting and you feel like joining a fast paced, fast growing startup, we should talk!
As part of the Oqton Engineering team, you will contribute to critical aspects of Oqton’s services throughout the entire development cycle:
- You will engineer, deploy, maintain and run parts of our stack: in-house built software, open source based solutions and off-the-shelf stacks
- You will engineer cloud setups on GCP, Aliyun, Azure, AWS and others; using cloud services
- You will engineer Kubernetes based solutions
- You will be a production and operations domain expert to the development teams, guiding and advising on how to best implement services, serve our customers, and iterate once a service is in production
- You will work with the engineers and the teams to achieve reliability through engineering, monitoring, logging and reporting
- You will respond to problems, analyse and debug, and work to resolve these.
What we seek:
- Bachelor or master’s in computer science, IT or similar fields
- Multiple years of experience working in similar functions
- Understanding of deployments at scale
- Security-first mindset
- Experience in devops/gitops, site reliability and/or infrastructure engineering in a quickly growing company using cloud-native technologies
- Strong knowledge of Linux, and basic knowledge of other OS systems
- Infrastructure as code using tooling and frameworks (terraform, helm...)
- Experience with deploying, using and debugging Kubernetes in production
- Experience with public cloud providers (GCP, AWS, Azure or Aliyun)
- Experience with Kubernetes in production
- Experience with observability, monitoring and logging stacks in production at scale
- Effective with CI/CD pipelines (e.g. CircleCI, Github actions, Travis CI or GitLab)
- Experience with Docker and working in a highly containerised environment in production
- Proficient scripting experience (Bash, ZSH, scripting languages)
- Experience with MongoDB, ElasticSearch, Pulsar, Kafka, Flink... is a plus
- Understanding of infrastructure core components: systems, storage, networking, DNS, virtualisation, containers...
You are a good fit if:
- You have a strong sense of accountability and own your own work
- You are a team player who loves working with others to find the right solutions
- You are a quick learner who is self motivated and possesses a strong sense of ownership and responsibility
- You have strong analytical and problem solving skills that drive elegant and maintainable solutions
- You have experience working with geographically and culturally diverse teams