AI is a transcript of our world

1.4/50 Summilux ASPH, Leica M10P, RAW

（This is a sequel to the following blog post.）
kaz-ataka.hatenablog.com

I often hear problematic discussions about the results of machine learning based AI.

When we look at something:

Extremely biased against men
Extremely biased against people of European descent
It is extremely liberal (left-wing in the English-speaking sense of the word)
Too much of the argument is directed at wealthy people
Extremely biased against people with good physiques and looks

These feelings are very understandable, but given the nature of machine learning, they are often unavoidable. This is because machine learning-based AI is a considerable computational environment in which algorithms, including text processing and machine learning, are implemented and trained for a specific purpose given a large amount of experience.

Large amounts of experience can be real (even in virtual space) if it is a game or manipulation (such as picking or driving) that produces results, but in many cases existing data is often used.

Existing data presents a challenge to those who use it in two ways.

First, it contains a large amount of material that is not necessarily factually correct. This should be called the reliability of the information, or whether it is trustworthy or not.

Second, it contains a lot of things that are factually correct but socially unacceptable. This should be called the social justice of the information, or whether it is socially acceptable or not.

Taking search as an example of the most widely used machine learning-based AI tool, the first issue has been a fundamental problem since the birth of search.

In addition to government and other trustworthy information sites, it also indexed sites in the Yahoo! directory, which at the time of Search's birth was the most labor-intensive and trustworthy, as well as the Page Rank invented by Larry Page (named after Page's name and the site's pages). It is almost certain that search platforms still evaluate the degree of use of information sources, the credibility of the site and the credibility of the person who produced the article, quite broadly and deeply.

Incidentally, before the Web, only publishers, newspapers and TV stations were able to provide information to a large number of people, so there was a lot of bias in the information due to the choice of media, but the credibility of the content was more than guaranteed to a certain extent. On the other hand, the information space has changed dramatically in the sense that a considerable amount of information is now suspect, as social media, represented by Twitter, YouTube and Tiktok, are advancing.

Google and the former YST (Yahoo! Search Technology), as well as Bing, Baidu, Naver, and Yandex, have long invested enormous energy in ensuring that the first information that appears on the Web always contains information that is wanted (relevant to interest), useful, and fresh. The so-called "ten blue links" are the result. Anyone who used web services more than 25 years ago should remember that it was common to have to scroll through several pages of machine searches to get to the information you wanted. When we consider that your vast search history has refined these results, this is a great human edifice, the result of the endless efforts of billions of people.

The second issue is not often discussed, but is much more important from a social justice perspective, and also much more difficult. It is directly related to what has recently been called Diversity, Equity, and Inclusion (DE&I). This is because what is at stake in this content has changed radically over time.

When I was a child in my early fifties, there were honestly only about five major DE&I-like issues in Japanese society.

First, eugenic discrimination. Although it became famous in Nazi Germany, it is actually extremely deep-rooted since Plato, and this actually overshadows the other axes. The fact that Japan also had a eugenics protection law until 1996, although in the latter period it was only a skeleton to allow abortion, is very sinful. I cannot begin to tell you the agony of those who were sterilized and those around them.

Second, racial and ethnic discrimination, especially the issue of black liberation. The issue is mixed with memories of the colonial and slavery era, and furthermore, it is rooted in the problem of acceptance of differentness, the problem of being different from each other. These include the issue of discrimination against "zainichi" in Japan and the theory of the Yellow Peril in the pre-war United States.

Third, discrimination against women. This is the history of women's liberation and coeducation, which began simultaneously in the U.S. in the late 1960s. It also includes discrimination in hiring and promotion between men and women in the workplace.*1

Fourth, the issue of wealth and poverty. This is the issue that is now called the social divide, but I suppose that Tiger Mask and Star of the Giants, etc., which crawl out of poverty, were televised with great social significance.

Fifth, national discrimination. This is an issue represented by the North-South problem at that time. It is often intertwined with issues of racial and ethnic discrimination.

If this were the case today, the challenges posed by eugenic ideas would be that while the success and rights of people with various disabilities have been taken for granted, as seen in the Paralympics (an excellent development), the debate has become considerably more complicated, with issues still being revived in relation to designer babies and gene therapy. It is quite difficult to figure out what is socially acceptable and to what extent.

Race is a biologically meaningless concept, and although those who question the elimination of discrimination based on it have become outwardly extinct, they are very reluctant to do so and have not yet eliminated the problem. As a result, it has become a much more sensitive issue than it once was, and the range of acceptable expression is extremely narrow. The tribal issue is a major political issue for the neighboring countries, which was not a major debate at that time.

Gender parity issues are being recognized in Japan as a problem to be solved, but nowadays gender issues naturally include the issue of sexual minorities, represented by LGBTQ. Japan's gender division, male/female, is significantly behind, with female, male, non-binary, prefer not to say, being the global standard. Here, too, the common sense of the past is no longer acceptable.

In the past, the body shape of the people in the ads was never an issue, but even Victoria's Secret (the leading women's underwear brand in the U.S.), which has produced supermodels such as Tyra Banks, Naomi Campbell, and Miranda Kerr, has decided to discontinue their Angels program in 2018 and transitioned to VS Collective, which highlights partners with unique backgrounds, interests, and passions (including Naomi Osaka). It is already out of the question that only beautiful men and women with good style appear in advertisements, and body diversity is now an inevitable trend.*2

The Divide issue is becoming more and more serious, yet there is a mysterious tendency to be afraid to discuss it openly. At the same time, the permissibility of drinking and smoking has dramatically decreased, although the relationship is subtle.

As far as national discrimination is concerned, it is improving considerably with the prosperity of Southeast Asia, represented by Singapore, China, India, Latin American countries such as Brazil, and some African countries, and as a result the zone of permissible expression towards these countries has changed drastically. On the other hand, problems related to the Taliban and the Islamic State (IS) after 9/11 have emerged from terrorism and international politics, problems that did not exist at that time.

Animal right, which few people cared about at the time, is now a sensitive topic, and if you say anything careless with the sense of the 1990s (30 years ago), you will step on a landmine.

In short, it is a completely different world than 30-40 years ago. Much of what was once tolerated is no longer allowed.*3

Nevertheless, if we take a scanned copy of the world's data as it is, the entire memory of these societies will be copied.

It means that the world will be copied with a world full of information that is not "politically correct/socially acceptable" in today's eyes. It is not just about the bias of the information being digitized. It is not just about the trustworthiness of the information, but also about the fact that the white and gray areas of DE&I are in fact moving targets, and the boundaries of what is acceptable are dynamic over time. In other words, it is virtually impossible to completely eliminate this challenge from machine learning-based AI.

As for machine learning-based AI, it will swallow all information that seems trustable once and for all, and provide it in terms of the importance of the data (distribution of the data and whether people will use it). This should be true for search and for large language model (LLM)-based AI like ChatGPT. But the result is that not only is it tainted by social bias, but it is also somehow tainted by the norms by which society operates.

Having said that just as you cannot remove criminal or discriminatory terms from the dictionary, removing them makes the search function, for example, much less useful. This is because first, the information itself is worth looking up, and second, most search terms (queries), which I will not go into detail about, are huge long-tail information that may or may not be used more than once a year, and the satisfaction of the search user depends heavily on whether these are answered or not.

Therefore, it is necessary to have a deep understanding of the literacy of information use in the modern age, to the point that the information sources that are the basis of AI contain information that is not acceptable on these two axes, that both axes are moving considerably, and that it is therefore impossible to create a completely clean tool.

Children should also be taught properly, and although it may be fine to start with "safe search", it is necessary to open up search to adults from around the time they enter junior high school or so, otherwise their interests will not be well served. At this stage, there needs to be a forum where the challenges and risks can be discussed repeatedly, along with the principles of machine learning based on case studies.

While you may be getting a little carried away at this point, I would like to point out two other axes of information provided in addition to Trustable/Acceptable.

The first is the bias of the user's orientation or inclination, although he or she may not be aware of it. This is the third axis. Machine learning absorbs more and more of your usage characteristics and produces more and more results that you like. This is called personalization.

Personalization does not necessarily mean that it is done to an individual. It happens in different languages and in different regions. 災害(Japanese) and "disaster" are processed differently. I don't know if this is a problem, but it clearly creates an information bias. As an interesting example, to remove the ID tagging, start a browser in incognito mode and do an image search for Beautiful woman, खूबसूरत महिला (Hindi) and you will see how different the results can be.

In addition to this linguistic, regional, and social context, there is the added bias of the type of search results you see. It is difficult to recognize this filter bubble or echo chamber problem unless you have a very strong sense that the search results you see and the chatbot responses are not generic. In fact, it may be better to continue searching, etc. without logging in.

Finally, the fourth axis is the degree to which society is actually behind the information. For example, in the 2016 U.S. presidential election between former Secretary of State Hillary Clinton and Donald Trump, the underdog was strongly in favor of Clinton, and many people said they would vote for Clinton when asked, but Trump actually won quite clearly. The other day at a Pixie Dust (PXDT) event, Dr. Yoichi Ochiai, the head of Pixie Dust, told me that this axis is important when looking at information, and I was struck by his words. This Ochiai axis, or degree of honesty, is quite important, but I am not sure how it is reflected in the information we see today or in the results of machine learning that incorporates this information. More research is needed.

As a literacy requirement in the age of machine-learning based AI, I have tried to sort out a bit of the story behind it and the implications of its information absorption. The AI that swallowed the transcripts is one of the greatest intellectual assets we have created, but there is a considerable amount to understand and keep in mind. I want to be able to use it knowing that.

Have fun with it!

ps. Click here for the original in Japanese.
kaz-ataka.hatenablog.com

*1:Although Japan has lagged behind, it is unquestionably just that both men and women should have the same educational opportunities and the same representation in society. In accordance with this perspective, prestigious universities on the East Coast of the United States, which were originally all-boys boarding schools, opened their doors across the board in the late 1960s, and since the end of the 20th century, they have realized gender parity. The global consensus is that what was originally done in co-educational elementary, junior high, and high schools with a 1:1 gender ratio should be done in higher education and in the workplace, especially at the decision-making level. The former U.S. Ambassador to Japan was a woman, and in Mexico, a complete gender parity has been realized even in the National Assembly, but even now only 10% of the Japanese Diet is made up of women. (Reference) Times Higher Education - World University Ranking 2023 : Gender ratio is a basic evaluation item, and even Caltech and MIT, which focus on science and engineering, have approx. 40% women. Incidentally, it is not a male-female ratio, but a female-to-male ratio. This is the global standard.

*2:Comedians as representatives of the general public on Japanese TV variety shows have contributed greatly in this regard.

*3:In Japan, most of these issues are rarely discussed openly, except for those that are convenient to discuss (such as employment of the disabled and the number of female executives), due to the "cover up what smells" culture. This has created an awareness in this country that is decades behind the major countries of the world, and people, especially those in leadership positions, should be well aware of this. I also strongly recommend that you look at how your operations and your company/organization are doing. As a personal note, I was grilled for several hours at the embassy a few years ago by a North American Ambassador to Japan about Japan's bizarre lag in various DE&I attributes, and it really made me want to cry about my country's current state of affairs.

ニューロサイエンスとマーケティングの間 - Between Neuroscience and Marketing

安宅和人: 残すに値する未来を