Site Reliability Engineer
Company: Writer
Location: San Francisco
Posted on: February 1, 2025
Job Description:
About WriterWriter is the full-stack generative AI platform
delivering transformative ROI for the world's leading enterprises.
Named one of the top 50 companies in AI by Forbes and one of the
best places to work by Inc. Magazine, Writer empowers hundreds of
customers like Accenture, Intuit, L'Oreal, Mars, Salesforce, and
Vanguard to transform the way they work.Writer's fully integrated
solution makes it easy to deploy secure and reliable AI
applications and agents that solve mission-critical business
challenges. Our suite of development tools is powered by Palmyra -
Writer's state-of-the-art family of LLMs - alongside our
industry-leading graph-based RAG and customizable AI
guardrails.Founded in 2020 with office hubs in San Francisco, New
York City, Austin, Chicago, and London, our team of over 250
employees thinks big and moves fast, and we're looking for smart,
hardworking builders and scalers to join us on our journey to
create a better future of work.About this roleWe are looking for a
foundational member of the Cloud Infrastructure team at Writer.
This role will involve contributing to the development and
implementation of our Site Reliability Engineering (SRE) program.
The ideal candidate will ensure the reliability, scalability,
performance, and security of Writer's critical systems, taking a
proactive approach to guarantee that our high-ROI products reach
our customers seamlessly.Your responsibilities:
- Lead the design, implementation, and maintenance of Writer,
Inc.'s cloud infrastructure to ensure high availability and
performance.
- Design and implement scalable cloud automation to support
seamless deployment for our largest enterprise customers.
- Automate infrastructure provisioning and management using
Terraform & Python.
- Collaborate with development teams to optimize cloud resources
and enhance system reliability.
- Develop and maintain monitoring and alerting systems to
proactively identify and resolve issues affecting the reliability
of our writing solutions.
- Conduct post-mortem analyses of system failures to identify
root causes and implement preventive measures.
- Optimize and scale our cloud infrastructure to support growing
user demand and ensure cost efficiency.
- Ensure the security and compliance of our systems, adhering to
industry standards and regulations.
- Provide mentorship and technical guidance to junior engineers,
fostering a culture of reliability and continuous improvement.
- Stay current with emerging technologies and industry trends to
continuously improve our site reliability practices.Is this you?
- Proven expertise in Site Reliability Engineering with a minimum
of 7 years of hands-on experience.
- Deep understanding of system architecture and infrastructure
design to ensure high availability and performance.
- Bachelor's degree in Computer Science, Engineering, or a
related technical field.
- Strong proficiency in programming languages such as Python,
Java, Go for automation and monitoring.
- Experience with cloud platforms like AWS, Azure, or GCP, and
their respective services for scalable and resilient systems.
- Expertise in containerization technologies (e.g., Docker,
Kubernetes) and orchestration tools.
- Knowledge of monitoring and logging tools (e.g., Prometheus,
Grafana, ELK Stack) to maintain system health and performance.
- Ability to lead and mentor junior engineers in best practices
for reliability and system optimization.
- Excellent communication skills to collaborate effectively with
cross-functional teams and stakeholders.
- Proactive approach to identifying and mitigating potential
system failures and performance bottlenecks.Preferred Skills &
Experience:
- Software engineering expertise.
- Terraform.
- Python.
- Kubernetes.
- Scala.
- AWS/GCP.Curious to learn more about who we are and how we
operate? .Benefits & perks
- Generous PTO, plus company holidays.
- Medical, dental, and vision coverage for you and your
family.
- Paid parental leave for all parents (12 weeks).
- Fertility and family planning support.
- Early-detection cancer testing .
- Flexible spending account and dependent FSA options.
- Health savings account for eligible plans with company
contribution.
- Annual work-life stipends for:
- Home office setup, cell phone, internet.
- Wellness stipend for gym, massage/chiropractor, personal
training, etc.
- Learning and development stipend.
- Company-wide off-sites and team off-sites.
- Competitive compensation, company stock options and 401k.Writer
is an equal-opportunity employer and is committed to diversity. We
don't make hiring or employment decisions based on race, color,
religion, creed, gender, national origin, age, disability, veteran
status, marital status, pregnancy, sex, gender expression or
identity, sexual orientation, citizenship, or any other basis
protected by applicable local, state or federal law. Under the San
Francisco Fair Chance Ordinance, we will consider for employment
qualified applicants with arrest and conviction records.By
submitting your application on the application page, you
acknowledge and agree to .
#J-18808-Ljbffr
Keywords: Writer, Santa Rosa , Site Reliability Engineer, Professions , San Francisco, California
Didn't find what you're looking for? Search again!
Loading more jobs...