PPE 4000 Research in PPE (Fall 2024)

Computational Text Analysis for Social Sciences

Author

Instructor: Pei-Hsun Hsieh

Course Information

Course Description

Language is the medium of social interaction. Recent advancements in computational methods for the quantitative analysis of text have unlocked the potential to explore vast amounts of unstructured textual data, thereby deepening our understanding of social interactions. This course offers an introductory overview of machine learning and natural language processing, with a focus on their applications in the behavioral and social sciences. Additionally, the course provides hands-on training in R and Python programming languages, emphasizing their use in data collection, machine learning, and natural language processing.

By the end of the course, you will be able to:

  • Utilize R or Python to perform computational text analysis tasks, such as text classification, topic modeling, and text similarity, with appropriate models and interpret the results.
  • Recognize and understand potential biases in AI.

Resources

  • Van Atteveldt, W., Trilling, D., & Calderón, C. A. (2022). Computational Analysis of Communication. Wiley Blackwell.

Course Requirements and Grading

Technical Requirements

Classes will be held at the UDAL Laboratory in PCPSE, which provides desktops equipped with all the necessary software. You will need your PennKey to log in to these desktops. You are also welcome to bring your own laptop if you prefer. The software we will use includes R and RStudio for R programming, and Google Colab (free and accessible through a browser with a Google account) for Python.

Grading

Your grade will be based on three components: class participation (30%), assignments (40%), and a final project (30%). Below is a summary of the main requirements for each element.

  • Class Participation (30%): Active participation is key in a research group setting. Each class will include in-class exercises, and you are encouraged to ask questions as you work through them. Participation will be graded based on:

    • Attendance: You are allowed two unexcused absences without penalty. For each additional unexcused absence, 5 points will be deducted from your participation grade (except for religious holidays, illness, or emergencies). If you anticipate an absence due to a religious holiday or another reason, please notify me at least one week in advance. For absences due to illness or emergencies, please provide appropriate documentation (e.g., a doctor’s note) within one week of the missed class.
    • In-Class Exercises: Submission of all in-class exercises is mandatory, whether or not you attend class. Failure to submit exercises with the expected results will result in a 2-point deduction for each missing or incomplete submission. Attending class should ensure you have no trouble completing and submitting these exercises.
  • Assignments (40%): Several take-home assignments will require you to apply models discussed in class and provide interpretations of the results.

  • Final Project (30%): For the final project, you will apply machine learning or natural language processing techniques to real-world data to conduct a meaningful analysis. You may use either R or Python. You must discuss your project plan with me before the midterm presentation and obtain approval. The final project consists of three parts:

    • Midterm Presentation (5%): In Week 11, you will present your project plan, including the research questions, the data you plan to use or collect, and the models you intend to apply.
    • Final Presentation (20%): During the last week of class, you will present the results of your analysis and provide interpretations of the findings.
    • Final Report (5%): A final report summarizing your project will be submitted, covering the research questions, data, models, results, and interpretations.

Grading Scale

93+ A 77-79 C+ 60-63 D-
90-92 A- 74-76 C 0-59 F
87-89 B+ 70-73 C-
84-86 B 67-69 D+
80-83 B- 64-66 D

Course Schedule

The instructor reserves the right to make reasonable changes to the syllabus and class/reading schedule during the course of the semester. Any changes to the syllabus will be announced on Canvas.

  • Week 1 (August 28): Introduction

  • Week 2 (September 2 & 4): No class

    • There will be no class this week due to Labor Day and the conference.
  • Week 3 (September 9 & 11): R Language: Programming Concepts

  • Week 4 (September 16 & 18): R Language: Strings, Regular Expressions, and Dictionary-Based Methods

  • Week 5 (September 23 & 25): Python: Programming Concepts

  • Week 6 (September 30 & October 2): Python: Strings, Regular Expressions, and Dictionary-Based Methods

  • Week 7 (October 7 & 9): Supervised Learning I: Text Classification with the Bag-of-Words Model

  • Week 8 (October 14 & 16): Text Similarity with Word Embeddings and Large Language Models (and Bias in AI)

  • Week 9 (October 21 & 23): Supervised Learning II: Text Classification with Large Language Models

  • Week 10 (October 28 & 30): Unsupervised Learning: Clustering Text and Topic Modeling

  • Week 11 (November 4 & 6): Downstream tasks by large language models: Named Entity Recognition and Part-of-Speech Tagging

  • Week 12 (November 11 & 13): Midterm Presentations

  • Week 13 (November 18 & 20): Optional Topics or Workshop

  • Week 14 (November 25 & 27): Optional Topics or Workshop

    • Wednesday, November 27 – Friday classes meet (so we do not have class on this date).
  • Week 15 (December 2 & 4): Final Presentations

Course Policies

Attendance

Please refer to the Course Requirements and Grading section.

Office Hours & Email Policy

If you have any questions about the course, feel free to contact/chat with me during office hours, right before/after class, or by email. Please include “PPE 4000” in the email subject and your full name in the main text. I will get back to you within two business days. Please follow up if I don’t respond within that timeframe.

Late Work Policy

Any late submission of exercises and assignments will be subject to a daily five-point penalty. There are no make-up presentations except in cases of illness, death in the family, religious observance, or other unusual circumstances. Accommodations will be granted on a case-by-case basis in such cases.

Academic Integrity

Make sure you are familiar with Penn’s Code of Academic Integrity (https://catalog.upenn.edu/pennbook/code-of-academic-integrity/). I have a zero tolerance policy for plagiarism and cheating, and all violations will result in substantial penalties. If you have questions about academic misconduct and plagiarism, please do not hesitate to contact me.

Use of AI

You can use generative AIs as a personal learning assistant, but keep in mind that if you rely on AIs for everything without digesting and evaluating their responses with your own knowledge, you are not truly learning. While generative AIs can answer simple questions, they may struggle with complex questions that are not within their training sets. Furthermore, the AI’s understanding may differ from my expectations in this class.

For open-book assignments, using AIs to proofread your drafts is an appropriate use. However, for closed-book quizzes and exams, using AIs will be considered a violation of Penn’s Code of Academic Integrity.

Resources

Academic Support and Disability Services

The Weingarten Center offers a variety of resources to support all Penn students in reaching their academic goals. All services are free and confidential. To contact the Weingarten Center, call 215-573-9235. The office is located in Stouffer Commons, 3702 Spruce Street, Suite 300.

Academic Support

Learning consultations and learning strategies workshops support students in developing more efficient and effective study skills and learning strategies. Learning specialists work with undergraduate, graduate, and professional students to address time and project management, academic reading and writing, note-taking, problem-solving, exam preparation, test-taking, self-regulation, and flexibility.

Undergraduates can also take advantage of free on-campus tutoring for many Penn courses in both drop-in and weekly contract formats. Tutoring may be individual or in small groups. Tutors will assist with applying course information, understanding key concepts, and developing course-specific strategies. Tutoring support is available throughout the term but is best accessed early in the semester.

Disability Services

The University of Pennsylvania is committed to the accessibility of its programs and services. Students with a disability or medical condition can request reasonable accommodations through the Weingarten Center website. Disability Services determines accommodations on an individualized basis through an interactive process, including a meeting with the student and a review of their disability documentation. Students who have approved accommodations are encouraged to notify their faculty members and share their accommodation letters at the start of each semester. Students can contact Disability Services by calling 215-573-9235.

Penn Wellness Resources

You can find a number of different health resources from Wellness at Penn (https://wellness.upenn.edu/).

SHAC (Student Health and Counseling)

SHAC (Student Health and Counseling) https://wellness.upenn.edu/student-health-and-counseling

  • For Medical Services students can go to 3535 Market Street, 1st Floor. They are open M-F 9:00-4:30 and Saturday 9:00-11:30. For after-hours help call 215-746-3535 (24/7). If the issue is life threatening, call 911.
  • For Counseling Services students can go to 3624 Market Street, 1st Floor West or call 215-898-7021. You or your students can call this number 24/7 and a clinician will answer. Counseling Services offers free, confidential mental health services to all students at Penn.

If You Have Financial Difficulties

It is important to me that you have the resources you need to be able to focus on learning in this course – this includes both the necessary academic materials as well as taking care of your day-to-day needs.

Students experiencing difficulty affording the course materials should reach out to the Penn First Plus office (pennfirstplus@upenn.edu).

Students who are struggling to afford sufficient food to eat every day and/or lack a safe and suitable space to live should contact Student Intervention Services (vpul-sisteam@pobox.upenn.edu).

Students may also wish to contact their Financial Aid Counselor or Academic Advisor about these concerns.

You are welcome to notify me if any of these challenges are affecting your success in this course, as long as you are comfortable doing so – I may have resources to support you.

Other Resources

Disclaimer

I reserve the right to change the syllabus at any time. I will notify you through Canvas if this occurs, but it is also important that you keep up-to-date with all readings related to your class.