# How People Use ChatGPT

> ## Abstract

## Abstract

> Despite the rapid adoption of [LLM](https://wiki.g15e.com/pages/Large%20language%20model.txt) chatbots, little is known about how they are used. We document the growth of [ChatGPT](https://wiki.g15e.com/pages/ChatGPT.txt)'s consumer product from its launch in <November 2022> through <July 2025>, when it had been adopted by around 10% of the world's adult population. Early adopters were disproportionately male but the gender gap has narrowed dramatically, and we find higher growth rates in lower-income countries. Using a privacy-preserving automated pipeline, we classify usage patterns within a representative sample of ChatGPT conversations. We find steady growth in work-related messages but even faster growth in non-work-related messages, which have grown from 53% to more than 70% of all usage. Work usage is more common for educated users in highly-paid professional occupations. We classify messages by conversation topic and find that "Practical Guidance," "Seeking Information," and "Writing" are the three most common topics and collectively account for nearly 80% of all conversations. Writing dominates work-related tasks, highlighting chatbots' unique ability to generate digital outputs compared to traditional search engines. Computer programming and self-expression both represent relatively small shares of use. Overall, we find that ChatGPT provides economic value through decision support, which is especially important in knowledge-intensive jobs.

- https://www.nber.org/system/files/working_papers/w34255/w34255.pdf
- https://openai.com/index/how-people-are-using-chatgpt/

## 3. Data and Privacy

The privacy-preserving classification pipeline:

> Messages are categorized according to 5 different LLM-based classifiers. The classifiers are introduced in more detail in Section 5, their exact text is reproduced in Appendix A, and our validation procedure is described in Appendix B. …
>
> The messages are … then classified according to classifiers defined over a controlled label space—the most precise classifier we use on the message-level data set is the O\*NET Intermediate Work Activities taxonomy, which we augment to end up with 333 categories.

Summary:

> To summarize, the key elements of our approach are:
>
> **Automated classification of messages**. In the course of analysis, no one ever looked directly at the content of user messages: all of our analysis of the content of user messages is done through output of automated classifiers run on de-identified and PII-scrubbed usage data.
>
> **Aggregated employment data via a data clean room**. We analyze and report aggregated employment data through a secure data clean room environment: no one on the research team had direct access to user-level demographic data and none of our analyses report aggregates for groups with less than 100 users.

## 메모

다른 내용보다도 "privacy-preserving automated pipeline"에 관심이 가서 읽어봤다. 사람 대신 LLM이 읽고 사전에 정의해둔 범주(<Controlled vocabulary>) 내에서 분류한다는 점 등은 마침 얼마 전에 한 프로젝트에서 고안했던 방식과 거의 동일했다. --ak, <2025-09-17>