Stellenbeschreibung

About Fusemachines

Fusemachines, established in 2013, is an enterprise AI company focused on making artificial intelligence more accessible. Using its proprietary AI Studio and AI Engines, the company supports organizations through AI-driven transformation at different stages of their digital journey. With teams across North America, Asia, and Latin America, Fusemachines offers enterprise AI products and specialized services that help businesses of all sizes build and scale AI solutions. Its clients span sectors including retail, manufacturing, and government.

The company also advances its mission by expanding access to quality AI education in underserved communities and helping organizations unlock their full potential through AI.

Role Overview

This is a mid-to-senior level Machine Learning Engineer / Data Scientist position focused on building and deploying machine learning solutions that create measurable business outcomes. The role spans the full ML lifecycle, including problem definition, data exploration, model development, evaluation, deployment, and monitoring. You will regularly collaborate with client stakeholders and internal delivery teams.

The ideal candidate has strong applied machine learning and data science experience, is comfortable working with real-world datasets, and can turn analytical work into production-ready solutions.

Key Responsibilities

Work with stakeholders to translate business challenges into machine learning problems such as classification, regression, forecasting, clustering, anomaly detection, and recommendation systems.
Define success metrics, evaluation approaches, and practical constraints such as latency, explainability, cost, and data limitations.
Use Python and SQL to pull, combine, and analyze data from relational databases and data warehouses.
Conduct data profiling, missing-value analysis, leakage checks, and exploratory analysis to inform modeling decisions.
Create reliable feature pipelines using aggregation, encoding, scaling, and embeddings where appropriate, while documenting assumptions clearly.
Develop and tune supervised learning models for structured data, including linear/logistic models, tree-based methods, gradient boosting models such as XGBoost, LightGBM, and CatBoost, and neural networks for tabular problems.
Apply best practices for tabular modeling, including handling missing data, categorical encoding, leakage prevention, class imbalance, calibration, and robust cross-validation.
Build and validate time series models using statistical, machine learning, or deep learning approaches with proper backtesting.
Apply clustering and segmentation techniques such as k-means, hierarchical clustering, DBSCAN, and Gaussian mixtures, and assess their stability and usefulness.
Use statistical methods such as hypothesis testing, confidence intervals, sampling, and experiment design to support inference and decisions.
Develop deep learning models with PyTorch or TensorFlow/Keras.
Follow sound training practices, including regularization, calibration, class imbalance handling, reproducibility, and careful train/validation/test design.
Choose the right evaluation metrics, prepare performance reports, and measure both model quality and business impact.
Perform error analysis and interpretation using tools such as feature importance, SHAP, and cohort slicing, then iterate based on findings.
Package models for deployment through batch scoring pipelines or real-time APIs and partner with engineering teams on integration.
Support practical MLOps by managing versioning, reproducible training, automated evaluation, drift and performance monitoring, and retraining plans.
Communicate tradeoffs, findings, and recommendations clearly to both technical and non-technical audiences.
Create documentation and lightweight demos that turn analytical results into actionable outputs.

Success in This Role

Models deliver measurable improvements such as revenue growth, cost reduction, risk reduction, better forecasting accuracy, or stronger operational efficiency.
Work is reproducible and built with production in mind, including clear data lineage, sound evaluation, and a practical path to deployment and monitoring.
Stakeholders have confidence in your method selection and your ability to communicate uncertainty honestly.

Required Qualifications

3 to 8 years of experience in data science, machine learning engineering, or applied machine learning.
Strong Python skills for analysis and modeling, with experience using pandas, NumPy, and scikit-learn or similar tools.
Strong SQL knowledge, including joins, window functions, aggregation, and query performance awareness.
Solid grounding in statistics, including hypothesis testing, uncertainty, bias-variance tradeoffs, and sampling, along with an experimentation mindset.
Practical experience across multiple modeling areas, including classification, regression, time series forecasting, and clustering or segmentation.
Hands-on experience with deep learning in PyTorch or TensorFlow/Keras.
Strong problem-solving ability and comfort working through ambiguous goals and messy data.
Clear communication skills and the ability to turn analysis into decisions.

Preferred Qualifications

Experience with Databricks for applied machine learning, including Spark, Delta Lake, MLflow, and Databricks Jobs/Workflows.
Experience deploying models to production and maintaining them through monitoring and retraining.
Experience with orchestration tools such as Airflow, Prefect, or Dagster, and with modern data platforms like Snowflake, BigQuery, Redshift, or Databricks.
Exposure to cloud platforms such as AWS, GCP, Azure, or IBM, plus containerization with Docker.
Knowledge of responsible AI and governance practices such as privacy and PII handling, auditability, and access control.
Consulting or client-facing delivery experience.

Certifications

Candidates with at least one relevant certification are especially encouraged to apply. Helpful certifications include cloud certifications from AWS, Google Cloud, Microsoft Azure, or IBM in data, AI, or ML tracks, as well as Databricks certifications for Data Scientist, Data Engineer, or related roles.

Nice-to-Have

Experience with causal inference methods such as quasi-experimental design, propensity scores, uplift modeling, heterogeneous treatment effects, or experimentation beyond A/B tests.
Experience designing and evaluating agentic workflows, including tool use, planning, memory or state handling, and guardrails, and integrating them into products.
Strong familiarity with AI-assisted development tools and coding workflows for faster product delivery, along with sound judgment around quality, security, and maintainability.

Immigration Sponsorship Policy

This role is not eligible for employment visa sponsorship or transfer sponsorship now or in the future.

No direct company sponsorship is available, including H-1B, J-1, or TN visas.
The company will not act as the employer of record on immigration paperwork.
Support letters or other documentation for work authorization, such as OPT, STEM OPT, or CPT, will not be provided.

Equal Opportunity

Fusemachines is an equal opportunity employer and is committed to diversity and inclusion. All qualified applicants will be considered without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other protected characteristic under applicable federal, state, or local law.

Data Scientist - Hybrid (3 days/week)

Where you'll work