🌱 Dive into Learning-Rich Sundays with groCTO ⤵️
Setting KPIs For Your Analytics
Ever feel like your software team's running blind? This article's your roadmap!
Wanna know how to ditch the guesswork and actually see where you're winning (or stumbling)? With this blog by Typo, we're diving into the cool world of software metrics – Discover which numbers really matter, from how often you deploy (are you Uber-fast?) to how solid your code is (Netflix-level reliable?). Plus, learn how to set goals that actually make sense and see how the big leagues do it.
Get nerdy with data?
Article of the Week ⭐
“Though these LLMs have advanced rapidly over the past few years and will likely continue to do so, they're not skilled enough at software engineering to replace real-life people quite yet…”
OpenAI Researchers Find That Even the Best AI Is “Unable to Solve the Majority” of Coding Problems
“Real life people” has an eerie undertone in the age of AI adoption in software engineering. The OpenAI research team is experimenting with SWE-Lancer. This benchmark puts the flagship LLMs to the test in coding scenarios against real problems found on the platform Upwork.
SWE-Lancer encompasses both independent engineering tasks — ranging from $50 bug fixes to $32,000 feature implementations — and managerial tasks, where models choose between technical implementation proposals.
This experiment is one of the first that introduces a monetary metric to tasks performed. In a nutshell, the analysis shows how efficient the LLMs are in problem solving for various types of tasks compared to their equivalent sallary/budget payout to freelancers.
Engineering vs Management
In aggregate the tested models, which included Claude Sonnet and two ChatGPT iterations, performed way below the half-way point for earnings. Pure engineering tasks capped at 30% of potential earnings, with most under 20%. Management tasks peaked at 45%, surprisingly, highlighting the still unsolved complexity of everyday software engineering work.
The payouts do not include the costs of runnings the models, which would further diminish efficacy on the benchmark compared to a “real life person” software engineer.
Other highlights 👇
Improving Team Morale is not an Objective
Thank you for the TL;DR
tl;dr:
Morale is a flawed metric obfuscating the real underlying problems.
Morale is a byproduct of everything going on in the company.
Instead of focusing on mood or morale, gather qualitative feedback on specific topics.
Marc Gauthier makes a strong case for investing in team morale. Just as refactoring can improve software quality over time, proactive efforts to nurture morale lead to a more resilient, high-performing team. His tongue-in-cheek example is a reminder that the expression of a healthy team culture is the backbone of sustained success, not merely the measurement of it.
Low Morale is symptom, not a dwindling resource. It is a byproduct of actions taken by team members, but especially leadership and management people like yourself to address friction points such as:
Handling low performance
Balancing work with personal and global events
Company restructuring, layoffs, new challenges that come with intense feedback
Avid readers of this software engineering intelligence community will note Marc’s focus towards surveying DevEx satisfaction as an indicator for opportunities for taking action, instead of general mood/vibe checks.
Unit Testing Basics
Unit testing is an essential habit for building reliable, maintainable software. Catching bugs early you build a foundation for continuous improvement. In essence, it’s a practice that pays dividends much like regularly maintaining your car to avoid costly breakdowns later.
Unit tests are designed to:
Verify behavior
Detect defects early
Enable refactoring
Of course there are ongoing debates that span decades on the right definitions for “behavior”, “integration vs unit”, “defect vs bugs”. Emmanuel pin points an all encompassing purpose to serve as a first line of defence in software engineering quality.
But there are more types of tests than craters on the moon. Our industry has developed categories to capture different kinds of tests to help navigate this landscape. This article highlights the key characteristics for easy maintenance and refactoring:
Solitary vs. Sociable
Picking the right level of isolation is a delicate balance. Too much isolation to a solitary unit will make the system under test (SUT) brittle by coupling the test to the internal structure of the SUT.
A higher vantage point creates more leeway and structure independence but may introduce too many social dependencies that are not relevant to the behavior being tested.
Solitary: Focus on logic within a small unit
Sociable: Validate behavior and collaboration between a larger unit of objects and functions
Find Yourself 🌻
That’s it for Today!
Whether you’re innovating on new projects, staying ahead of tech trends, or taking a strategic pause to recharge, may your day be as impactful and inspiring as your leadership.
See you next week(end), Ciao 👋
Credits 🙏
Curators - Diligently curated by our community members Denis & Kovid
Featured Authors - Noor Al-Sibai & OpenAI Research, Marc G Gauthier,
Sponsors - This newsletter is sponsored by Typo AI - Ship reliable software faster.
1) Subscribe — If you aren’t already, consider becoming a groCTO subscriber.
2) Share — Spread the word amongst fellow Engineering Leaders and CTOs! Your referral empowers & builds our groCTO community.