We’re Still Missing 'Ethical AI' From Start to Finish
In bridging the digital divide, A.I. labeling platforms like ‘Sama’ help transform training data for Rapidly Developing Nations
Join the DZone community and get the full member experience.Join For Free
By combining its advanced data training and imaging annotation technology with a social-good mission, Sama has uniquely established itself as a model for the future of the artificial intelligence industry. In particular, one that leads innovation and empowers people around the world.
If you aren’t familiar with Sama and its infrastructure, it’s one to familiarize yourself with, given its founding roots. Launched in 2008 by late-founder Leila Janah, Sama is a San Francisco-based tech company that delivers high-quality training data, which feeds ML algorithms for a range of industries, powering robot-assisted surgery, autonomous vehicles, personalized online shopping experiences, and more.
The company has gained the trust and loyalty of some of the world’s leading brands, including, but not limited to Walmart, Google, GM, and NVIDIA.
Janah, who passed away in January 2020 due to complications from epithelioid sarcoma, as covered by The New York Times, launched Sama as the result of her research and experience working as a consultant at Katzenbach Partners (now Booz and Company) and the World Bank.
Having been exposed to the harsh realities of poverty from an early age, Janah became interested in global development while teaching English and creative writing in rural Ghana through a high-school scholarship. She then went on to complete her degree in African Development Studies from Harvard University.
At the beginning of this year, the company rebranded to Sama, having raised $15 million in venture capital, according to Crunchbase. Following Janah’s passing, her longtime business partner and friend, Wendy Gonzalez, took over as interim CEO, previously transitioning out of her role as the organization’s COO.
Gonzalez spent the previous five years working alongside Janah to craft Sama’s vision and strategy. Since then, Gonzalez has stepped into the position of full-time CEO, leading the company towards achieving three-year revenue growth of 134% to gain a spot on the Inc. 5000 list as one of America’s fastest-growing private companies.
I spoke with Gonzalez more in-depth about the importance of making A.I. more accessible to less-developed nations, whose algorithms depend on ensuring that accurate, unbiased, and high-quality data is being collected and distributed.
Today, poor data is certainly a major bottleneck in machine learning models and according to the CEO, is why eight out of ten algorithms fail. As a company, Sama specializes in image, video, and sensor data annotation and validation for machine learning (ML) algorithms across industries including manufacturing and robotics, bio and medtech, autonomous vehicles, entertainment, e-commerce, retail, and agriculture.
#1: The Missing Piece
When it comes to A.I., algorithms are only as good as the data that powers them," Gonzalez told me. "In other words, ‘garbage in, garbage out.’ That’s why our team at Sama combines an expert human-in-the-loop team with cutting-edge technology to provide our customers with the highest quality training data for their A.I. projects."
In its social mission of continuing to build out a global presence throughout East Africa countries, India, and other rapidly developing nations, Sama’s efforts at bridging the digital divide have been central to its mission of shaping a more ethical future for artificial intelligence.
The industry lacks what Gonzalez calls "ethical A.I.," whereby ethical or impact implications have fallen to the wayside. "As AI adoption accelerates, major industry players have remained focused on creating and developing the most advanced algorithms and products available to get ahead of the competition and establish themselves as leaders in the space;" a one-goal focus she calls it.
However, when it comes to developing impact-led business models, Gonzalez believes that A.I. organizations might not always know where to start, especially when having an ethical A.I. supply-chain requires an in-depth understanding of the entire lifecycle of tech from inception to implementation.
"Since the A.I. and ML industry is still in its infancy, the tech industry’s perspective is centered around its applications and ability to transform the world around us technologically. If this focus could shift to center around ethics and impact, we could develop and understand the ways that A.I. can help us better achieve our social goals."
While Gonzalez and Sama were still processing and recovering following Janah’s passing, the COVID-19 pandemic continued to sweep the globe, presenting what Gonzalez described as a "particularly tumultuous" time for the company.
"Shortly after losing our founder, CEO, and friend, Leila, our team was thrown into the realities of a pandemic and remote work," she said. "Still, we’ve remained committed to providing our customers with the high-quality training data they need to power their ambitious A.I. projects."
For a company like Sama, whose mission is so deeply rooted in global relationships and technology, Sama was forced to take "swift internal action to ensure its employees were able to safely maintain their jobs and deliver secure, high-quality training data to customers. To do so, the company launched SamaHome, a solution that provided safe living conditions to Sama employees in Nairobi and Uganda. As needs changed, we piloted a WFH program, ensuring employees had consistent access to resources like internet infrastructure to enable success."
If Janah’s passing and COVID-19 weren’t enough for Sama to deal with, the socio-political outcries for addressing racial justice and systemic injustice were factors attributable to the growth in global use of A.I. between 2015 and 2019.
"Between 2015 and 2019, the global use of artificial intelligence grew up 270 percent," Gonzalez pointed out, "which resulted in estimates that 85% of Americans are already using A.I. products daily." However, part of the missing piece in the ethical dilemma surrounding A.I. is that A.I.'s inherent bias, according to the CEO, threatens to diminish the technology in progress.
"Understanding that societal change must be reflected in our ability to create a more just and accurate future of A.I., we set out to detect and correct these biases with its bias detection solution."
#2: Annotating and Validating Multimedia is Essential to A.I. Growth and Development
With the United States, China, and Japan owning close to 80% of all technology, it’s ironic that so few people have access to the internet, particularly in less-developed countries.
"While projections place the A.I. industry's value at $13T by 2030," Gonzalez explains, "just three countries - the United States, China, and Japan own almost 80% of the technology; meanwhile, only one in five people have access to the internet in the least developed countries, excluding them from economic advancement and further widening the digital divide between these nations and the rest of the world."
In a ‘eat or be eaten world’, those nations that lack the necessary skills required to participate in the digital economy will likely be left behind, to which Gonzalez attested, emphasizing the importance of providing training data agents with self-paced technical and soft-skill training that enables them to progress "not only within the company but also in tech-careers long-term."
"It’s all about democratizing access to the growing A.I. industry," she added.
But the reality is, with less-developed countries such as Kenya, India, and Uganda, so few people have access to the internet.
"Having access to the internet requires a vast and complex network of infrastructure that isn’t currently available globally," Gonzalez explains. "As a result, only 35% of populations in developing countries have access to the Internet today."
And to Gonzalez’s point, there is a clear digital divide here, which is only amplified in rural areas.
"This digital divide creates significant hurdles for developing nations, especially in an increasingly digital-world that requires the internet for connectivity, economic and human development. These problems are only amplified in rural areas where finding reliable service at an affordable price is often impossible. This is in part due to the high-cost of maintaining network infrastructure in these regions as well as a lack of demand for access. In these regions, access to the internet can cost 2-3x more than in an urban area."
And as many lives and businesses faced challenges in 2020, we had to find a way to persevere. Sama had to ensure its employees could safely maintain their jobs and deliver secure, high-quality training data to customers throughout the pandemic. To achieve this, Sama launched SamaHome, a solution that provided safe living conditions to its employees in Nairobi and Uganda.
"To ensure that our impact workforce had access to the hardware and broadband connectivity they needed for remote work, we worked with local Internet Service Providers (ISPs) to lay 34 miles of fiber optic cable in neighborhoods where high-speed internet connectivity was limited. The majority of our impact workforce did not have access to the internet prior to COVID. Beyond allowing our workers to continue working, the onset of this access has provided the connectivity and convenience that the internet brings to themselves and their loved ones."
Training data, as Gonzales shares, is the process of annotating and validating images and videos for A.I., which is essential to algorithm development and success.
"With estimates projecting the global artificial intelligence market value at $390.9 billion by 2025, A.I. is becoming increasingly essential to the technologies we utilize daily," she says. "However, 8 out of 10 AI projects with insufficient training data fail. As AI technology is increasingly deployed across industries, it’s critical to ensure the data used within the AI algorithms is properly trained to avoid bias."
But how do biases affect potential inaccuracies?
"Beyond generating inaccuracies, these biases influence critical decisions in hiring, safety and criminal justice," she says, explaining that these errors often come into place as the result of improper training of data, which according to Gonzalez, occurs when datasets are not diverse enough or include irrelevant data.
The CEO pointed to a study conducted by MIT Media Lab researcher Joy Buolamwini, which explored "gender classification systems sold by IBM, Microsoft, and Face++", which she says were found to have "an error rate as much as 34.4 percentage points higher for darker-skinned females than lighter-skinned males." Similarly, in Broward County, Florida, a law enforcement program used to predict the likelihood of crime was found to "falsely flag black defendants as future criminals (...) at almost twice the rate as white defendants."
Sama’s Training Data Platform is a reflection upon the company’s high-quality annotation technology and a team of experts who validate millions of image points, photos, and videos to remove bias and create successful foundations for A.I. algorithms.
Specifically, the platform consists of scalable feedback loops created by a repeated system of expert-human machine training and machine assistance, called ML Assisted Annotation technology.
The company says that so far, ML Assisted Annotation has yielded strong results, including a 300% acceleration in annotation time while maintaining Sama’s industry-leading quality with most projects falling above 96% inaccuracy. "Additionally, through integrated API, automated workflows, and workflow management capabilities, our customers can integrate and track annotation in record time," she added.
#3: Debriefing the ‘Randomized Control Trial’ Alongside MIT
Having helped lift more than 53,000 people out of poverty in countries such as Kenya and Uganda, Sama recently completed a Randomized Control Trial, or RCT with the Massachusetts Institute of Technology (MIT), which found that workers receiving both training and a job referral from Sama, saw almost a 40% higher earnings and 10% points lower unemployment than their non-Sama counterparts.
The RCT took place from 2017 to 2020, evaluating how training alone or training and an opportunity to work for Sama impacted factors such as employment rates, average earnings, and well-being for individuals from low-income and marginalized backgrounds in Nairobi, Kenya.
"During the study, individuals that participated were split into three groups, a Control Group that received neither training nor employment at Sama, Group 1, which only received training, and Group 2, which received training and the opportunity to work at Sama."
After three years of surveying these three groups, Sama learned that individuals that received both training and an opportunity to work at Sama exhibited lower unemployment rates and higher average monthly earnings in comparison to individuals in Group 1 and the control group.
"Furthermore, the impact of training and employment at Sama was even more significant for women. The RCT found a 270% increase in women’s monthly earnings after joining Sama, 60% higher earnings for women in Group 2 compared to the control group as well as 40% higher wages for people who were trained and had an opportunity to work at Sama in comparison to the control group."
The company’s three-year commitment to demonstrate the value of its long-term commitment to transparency and an ethical A.I. supply chain.
A Message about Leila…
The company’s non-profit arm, the Leila Janah Foundation, was named after Janah, which was created to accelerate the adoption of social enterprises that Give Work to those in underserved communities. She coined this "Give Work", and founded a movement to bring her vision to life.
After Janah’s passing, the nonprofit was renamed to honor her legacy, Gonzalez shared with me. Today, in addition to advising and providing oversight to Sama, the Leila Janah Foundation is focused on building its signature program - the Give Work Challenge - to accelerate entrepreneurship in East Africa.
Since 2018, interest in the Give Work Challenge has grown and 10 businesses in Kenya and Uganda have been named winners. In 2020, the Give Work Challenge had its most competitive cohort to date. More than 110 businesses, all of whom had to demonstrate how their business models aligned with at least two UN Sustainable Development goals, applied for funding. Of the more than 110 applicants, 18 semi-finalists and four winners were selected.
"I strongly believe you can combine the highest quality of service with the core mission of altruism," Janah shared with TechCrunch prior to her passing. From Janah’s perspective, providing for-profit AI training data to global companies can be done while improving lives in underserved and underrepresented regions.
Opinions expressed by DZone contributors are their own.