Software QA • AI Model Evaluation • Remote Roles

Testing complex systems. Evaluating AI outputs. Turning ambiguity into actionable findings.

I’m Michael R. Williams, a Software Quality Engineer with more than twenty years of experience evaluating complex system behavior, identifying defects, analyzing edge cases, and helping teams deliver reliable software.

I'm seeking part-time, contract, consulting, or project-based remote opportunities in AI evaluation, quality analysis, and model assessment. I’m not trying to climb the ladder. I’m bringing decades of evaluation and quality experience to help improve AI systems.

Contact Me View Experience

A QA background built for AI evaluation

AI systems do not always behave predictably. They require more than simple pass/fail validation. They require structured evaluation, careful review of inconsistent outputs, and the ability to translate ambiguous behavior into clear, useful feedback.

Structured Output Review

I evaluate whether outputs meet technical requirements, user expectations, and real-world usefulness.

Failure Pattern Detection

I look for inconsistencies, hallucinations, edge-case failures, unclear responses, and reliability issues.

Clear Actionable Findings

I turn vague or inconsistent behavior into practical reports that product, engineering, and evaluation teams can use.

Hands-on generative AI experience

I have actively worked with generative AI tools including ChatGPT, Nano Banana, SeeDream, Grok, and others to generate, evaluate, and refine outputs across technical and creative use cases.

Prompt Iteration

Developing and refining prompts to improve quality, clarity, consistency, and usefulness of generated outputs.

Quality Assessment

Reviewing AI output for accuracy, relevance, coherence, and potential failure modes.

Practical AI Application

Using multiple AI tools to develop product designs and marketing content for a niche print-on-demand business.

Core competencies

My strengths combine traditional software quality engineering with the type of analytical review needed for AI evaluation work.

Test Case Development

Root Cause Analysis

Requirement Development

Technical Documentation

Problem Troubleshooting

Issue Triage

SQL and Data Analysis

Project Metrics

Mentoring

Edge Case Analysis

Systems tested and supported

My work has covered retail, telecom, government, logistics, web applications, data warehousing, command-center systems, and network control environments.

Point-of-sale user interface software for Salesforce.com customers including Les Schwab, NAPA, Hallmark, Party City, and others.

Verizon internal order processing and government ordering systems.

Customer-facing applications on the EchoStar / DISH Network website.

Asset management and tracking systems at WorldCom.

Network data warehousing, control, and monitoring systems at MCI.

Logistics systems for the Royal Saudi Air Force.

Space Forecast Center and Wing Command Center systems at Schriever Air Force Base.

Small vehicle dealer warranty processing and software-based manuals with interactive exploded parts views.

Professional experience

More than two decades of software quality, testing, engineering, technical leadership, and systems analysis experience.

2021 – 2023

Medical Leave and Recovery

Returned with a renewed focus on applying software quality expertise to AI evaluation and analysis roles.

2011 – 2021

Salesforce.com / Demandware / Tomax, Inc.

Principal Software QA Engineer / QA Lead — Salt Lake City, Utah

2009 – 2010

Independent Software Engineering

Software Engineer — Colorado Springs, Colorado

2007 – 2009

Sapphire Technology at Verizon Business

Contract Software Engineer — Colorado Springs, Colorado

2006 – 2007

EchoStar

Test Engineer — Englewood, Colorado

2003 – 2006

L&M Sales

Software Engineer — Colorado Springs, Colorado

1999 – 2002

WorldCom

Software Lead — Colorado Springs, Colorado

1997 – 1999

ISTC, Inc.

Corporate President / Principal Test Engineer — Colorado Springs, Colorado

1991 – 1997

MCI

Lead Test Engineer — Colorado Springs, Colorado

Let’s talk about AI quality.

I am especially interested in roles where I can contribute to improving model reliability, identifying failure patterns, and helping ensure that AI systems deliver accurate, useful, and responsible outputs.

Email Mike Call 801-502-1783