EXPLORE PREMIER
OPPORTUNITIES
As a skilled professional seeking career growth, you deserve access to the best job opportunities available. Join Outdefine's Trusted community today and apply to premier job openings with leading enterprises globally. Set your own rate, keep all your pay, and enjoy the benefits of a fee-free experience.
About the job
Overview:
About Hebbia
The user interface for AGI – Hebbia is AI that works the way you work.
Designed to be generally capable– it can tackle even the most complex tasks, citing answers over any amount of sources. By showing its work, Hebbia empowers users to collaborate with AI on each step and validate responses instead of blindly trusting them. Our mission is to put capable AI in the hands of 1 billion people by 2030.
Job Description
As a highly skilled Site Reliability Engineer (SRE), you will build systems that optimize the uptime and reliability of our platform, and lead the management and optimization of our DevOps and infrastructure operations. You will be responsible for owning our deployment pipelines, building and maintaining our continuous integration and continuous deployment (CI/CD) systems, ensuring the reliability and performance of our services, enhancing our observability, supporting our local development environments, and bolstering our security posture. Your technical expertise, leadership abilities, and problem-solving skills will contribute to the success of our AI products and shape the future of our technology stack.
This role is based out of our New York City office in Soho.
Responsibilities
- Own and manage deployment pipelines to ensure seamless and efficient software releases.
- Implement and maintain observability solutions for monitoring system performance and reliability.
- Support and enhance local development environments to streamline developer workflows.
- Collaborate with development teams to ensure infrastructure meets the needs of ongoing projects.
- Enhance the security posture of our infrastructure through proactive measures and regular audits.
- Develop and maintain automation scripts and tools to improve operational efficiency.
- Troubleshoot and resolve infrastructure and application issues to minimize downtime and ensure smooth operations.
- Continuously evaluate and integrate new technologies to improve the scalability, reliability, and security of our infrastructure.
Who You Are
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 5+ years software development experience at a venture-backed startup or top technology firm.
- Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role.
- Strong expertise in managing CI/CD pipelines and deployment automation.
- Proficiency in cloud platforms such as AWS, Azure, or Google Cloud (we are an AWS shop).
- Solid understanding of containerization and orchestration technologies such as Docker and Kubernetes.
- Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, or similar.
- Knowledge of infrastructure-as-code (IaC) tools such as Terraform or CloudFormation.
- Familiarity with security best practices and tools for infrastructure and application security.
- Excellent problem-solving skills and the ability to troubleshoot complex issues.
- Strong communication skills and the ability to work effectively in a collaborative environment.
- A proactive and self-motivated approach to learning and adopting new technologies.
- Passion for continuous improvement and operational excellence.
Become a trusted member, apply to jobs, and earn token rewards
Create and customize your member profile.
Earn 500 Outdefine tokens for becoming trusted member and completing your assessment.
Once you are a Trusted Member you can start applying to jobs.