::p_load(readr,dplyr,stringr) pacman
R Language: Strings, Regular Expressions, and Dictionary-Based Methods
In this script, we will use three packages:
readr
for reading .csv files.dplyr
for data wrangling.stringr
for string operations
String operations
As mentioned earlier, strings are values that are neither numbers nor Boolean values—typically words or text meant for human reading. In R, any value surrounded by quotation marks is considered a string. R refers to strings as “characters.”
R provides some built-in functions for string operations, but in this lecture, we will focus on using the stringr
package (you can find the documentation and a cheat sheet on their website).
Let’s create a data frame my_course
and play with the string values in it.
<- data.frame(
my_course Course = c("Computational Text Analysis", "Addressing Contemporary Societal Challenges", "Independent Study", "Behavioral Economics and Psychology"),
Section = c("PPE 4000-302", "PPE 4600-301", "PPE 3999", "PSYC 2750-401"),
email="phsieh@sas.upenn.edu",
max_enroll = c(6, 18, NA, 60)
) my_course
Course Section
1 Computational Text Analysis PPE 4000-302
2 Addressing Contemporary Societal Challenges PPE 4600-301
3 Independent Study PPE 3999
4 Behavioral Economics and Psychology PSYC 2750-401
email max_enroll
1 phsieh@sas.upenn.edu 6
2 phsieh@sas.upenn.edu 18
3 phsieh@sas.upenn.edu NA
4 phsieh@sas.upenn.edu 60
You can count the number of characters in a string using str_length()
from the stringr
package. For example:
str_length(my_course$Course)
[1] 27 43 17 35
To check if a string contains a specific pattern, use str_detect()
:
str_detect("Computational Text Analysis", pattern = "Computational")
[1] TRUE
str_detect("Addressing Contemporary Societal Challenges", pattern = "Computational")
[1] FALSE
To count how many times a pattern appears in a string, use str_count()
:
str_count("Computational Text Analysis", pattern = "Computational")
[1] 1
str_count("Addressing Contemporary Societal Challenges", pattern = "Computational")
[1] 0
As mentioned earlier, you can apply these functions to vectors, and R will return a corresponding vector of results for each element in the vector:
str_count(my_course$Course, pattern = 'Computational')
[1] 1 0 0 0
You can combine two strings into one using str_c()
. For example, to combine a section and a course name:
str_c("PPE 4000-302", "Computational Text Analysis")
[1] "PPE 4000-302Computational Text Analysis"
To make it more readable by adding a space between the section and the course name, use the sep
argument in str_c()
:
str_c("PPE 4000-302", "Computational Text Analysis", sep = " ")
[1] "PPE 4000-302 Computational Text Analysis"
You can apply this to an entire data frame and create a new variable, Full_name
, combining the section and course name:
<- my_course %>% mutate(Full_name = str_c(Section, Course, sep = " "))
my_course my_course
Course Section
1 Computational Text Analysis PPE 4000-302
2 Addressing Contemporary Societal Challenges PPE 4600-301
3 Independent Study PPE 3999
4 Behavioral Economics and Psychology PSYC 2750-401
email max_enroll
1 phsieh@sas.upenn.edu 6
2 phsieh@sas.upenn.edu 18
3 phsieh@sas.upenn.edu NA
4 phsieh@sas.upenn.edu 60
Full_name
1 PPE 4000-302 Computational Text Analysis
2 PPE 4600-301 Addressing Contemporary Societal Challenges
3 PPE 3999 Independent Study
4 PSYC 2750-401 Behavioral Economics and Psychology
To split a string into individual words, use str_split()
and specify the pattern = " "
to split by spaces:
str_split("Computational Text Analysis", pattern = " ")
[[1]]
[1] "Computational" "Text" "Analysis"
Note that the result is a list with one element, which is a vector containing the words. To access a specific word, first access the vector using [[]]
and then access the word using []
. For example, to get “Text” from the result:
str_split("Computational Text Analysis", pattern = " ")[[1]][2]
[1] "Text"
To replace a pattern within a string, use str_replace_all()
. For example, to replace “@” in an email address with ” at “:
str_replace_all("phsieh@sas.upenn.edu", pattern = "@", replacement = " at ")
[1] "phsieh at sas.upenn.edu"
If we want to replace “.” with ” dot ” in an email address:
str_replace_all("phsieh@sas.upenn.edu", pattern = ".", replacement = " dot ")
[1] " dot dot dot dot dot dot dot dot dot dot dot dot dot dot dot dot dot dot dot dot "
Oops! What happened? Let’s explore regular expressions to understand this behavior!
Regular Expression (RegEx)
Regular expressions are patterns used to match character combinations in strings. They allow us to match strings based on more flexible and generalizable patterns.
Here are some key categories of regular expression syntax:
- Matching characters
- Character sets (Alternatives)
- Anchors
- Quantifiers
- Lookarounds
For example, if we want to extract the subject “PPE” from “PPE 4000-302,” we can use str_extract()
with the pattern "^[A-Z]{3}"
. This pattern extracts the first three uppercase letters from the start of the string:
str_extract("PPE 4000-302", pattern = "^[A-Z]{3}")
[1] "PPE"
or
str_extract("PPE 4000-302", pattern = "^[:upper:]{3}")
[1] "PPE"
When applied to all course sections, like this:
str_extract(my_course$Section, pattern = "^[A-Z]{3}")
[1] "PPE" "PPE" "PPE" "PSY"
It returns only “PSY” for “PSYC 2750-401”, because we specified the pattern to match exactly the first three capital letters. To generalize this and extract one or more capital letters from the beginning, we can modify the pattern:
str_extract(my_course$Section, pattern = "^[A-Z]+")
[1] "PPE" "PPE" "PPE" "PSYC"
If we apply this to course names, we will only get the first capital letter because the pattern looks for capital letters starting at the beginning of the string:
str_extract(my_course$Course, pattern = "^[A-Z]+")
[1] "C" "A" "I" "B"
Now, let’s revisit the str_replace_all("phsieh@sas.upenn.edu", pattern = ".", replacement = " dot ")
issue. It replaced all characters with ” dot ” because, in RegEx, .
matches any character except a new line. To match a literal period, we need to escape the .
by using a backslash (\
). However, since \
is also a special escape character in R, we need to escape it as \\
.
To fix the pattern, we should write:
str_replace_all("phsieh@sas.upenn.edu", pattern = "\\.", replacement = " dot ")
[1] "phsieh@sas dot upenn dot edu"
We can also use RegEx to count the number of words in a string. "\\w"
captures any word characters (both letters and numbers), so the pattern "\\w+"
matches one or more word characters:
str_count(my_course$Course, '\\w+')
[1] 3 4 2 4
When we use str_extract()
, it will only return the first match emerging.
str_extract(my_course$Section, pattern="[0-9]")
[1] "4" "4" "3" "2"
str_extract_all()
return all the matches by a list of vectors.
str_extract_all(my_course$Section, pattern="[0-9]")
[[1]]
[1] "4" "0" "0" "0" "3" "0" "2"
[[2]]
[1] "4" "6" "0" "0" "3" "0" "1"
[[3]]
[1] "3" "9" "9" "9"
[[4]]
[1] "2" "7" "5" "0" "4" "0" "1"
We can also use str_extract_all()
to extract all the words. For example,
str_extract_all("Computational Text Analysis", pattern="\\w+")
[[1]]
[1] "Computational" "Text" "Analysis"
You can see that this gives us the same result from str_split("Computational Text Analysis", pattern = " ")
.
If we want to extract the section number after “-” from the section column, we can extract the number at the end:
str_extract(my_course$Section, pattern="[0-9]+$")
[1] "302" "301" "3999" "401"
However, some courses have no section number and return its course number. We can use the preceded by operator to specify that it must be the number preceded by “-”:
str_extract(my_course$Section, pattern="(?<=-)[0-9]+$")
[1] "302" "301" NA "401"
Excercise
- Use
str_extract()
and a regular expression (RegEx) to extract the first word from each of the four course names.
- Use
str_extract_all()
and a regular expression (RegEx) to extract only the first word from each of the four course names. Ensure that the regular expression directly extracts the first word, rather than selecting the first item from the output.
Dictionary-Based Methods
A dictionary-based method is used to measure variables from unstructured text by relying on predefined lexicons—lists of words related to specific concepts. This method quantifies a variable by counting how often words from the lexicon appear in the text. Dictionary-based methods have been widely used in psycholinguistics and often serve as a baseline to evaluate machine learning models designed to measure the same concepts. One of the most well-known dictionaries is Linguistic Inquiry and Word Count (LIWC), though it is proprietary.
Today, we will analyze video transcripts from TED-Ed. Please download the dataset from https://www.kaggle.com/datasets/viratchauhan/ted-ed and load it into R:
<- read_csv("teded.csv") teded
Rows: 2109 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Title, Link, Caption
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
First, let’s count how many words are in each transcript:
str_count(teded$Caption, "\\w+")
[1] 2835 674 667 654 681 324 721 670 2233 666 809 699 1069 674
[15] 617 3910 803 752 672 692 1741 766 737 666 797 843 2056 1029
[29] 615 488 658 714 678 699 738 685 685 648 1534 191 1557 729
[43] 1940 677 643 728 1488 1738 575 854 655 679 741 703 659 639
[57] 661 721 683 805 627 724 487 659 703 592 745 610 1871 620
[71] 585 661 672 743 653 756 704 1311 754 642 2014 748 716 3002
[85] 705 751 710 3431 3301 709 627 754 661 1040 708 714 626 769
[99] 678 643 714 693 571 524 2554 711 691 679 2789 726 171 727
[113] 197 727 700 1219 1687 651 2691 688 704 689 801 585 702 681
[127] 1507 633 661 167 606 588 618 3638 754 527 703 634 3880 3210
[141] 727 806 1014 2023 693 649 2031 683 2634 1130 2968 2475 645 755
[155] 608 654 622 3142 827 2424 710 745 664 618 670 607 2528 2612
[169] 645 632 717 650 204 644 689 610 868 498 686 529 708 686
[183] 669 741 922 1896 889 3148 739 561 669 1877 720 684 872 896
[197] 689 748 711 686 709 461 652 836 714 3055 538 627 2309 371
[211] 1389 537 1886 2319 626 687 671 1652 649 676 776 461 701 2118
[225] 850 711 663 722 711 687 924 1637 603 731 1660 688 656 424
[239] 703 4631 672 664 1984 2911 2312 2622 582 855 713 584 1073 873
[253] 530 2367 2513 695 137 694 690 491 694 873 624 1505 722 605
[267] 794 690 866 646 899 658 666 1326 654 646 1304 860 1919 676
[281] 368 518 704 686 619 2053 646 713 623 515 746 738 599 706
[295] 2144 675 720 3576 663 638 205 3250 462 2831 673 738 681 697
[309] 704 757 690 680 744 757 811 700 550 644 658 773 676 1420
[323] 766 730 762 587 191 3141 681 725 676 618 3646 1589 667 894
[337] 795 1612 525 805 838 672 692 203 233 116 736 601 639 656
[351] 693 683 663 659 183 869 3026 3131 403 1696 598 1994 3983 607
[365] 646 2582 887 2718 3196 972 705 697 3391 673 744 712 196 616
[379] 877 634 1243 820 726 931 609 646 3003 668 632 2887 671 616
[393] 480 679 897 704 1666 1225 743 3856 707 3368 642 2754 650 658
[407] 688 628 661 549 658 3106 716 4191 851 696 607 567 696 714
[421] 841 816 715 473 555 574 634 678 621 366 658 672 672 685
[435] 751 3209 730 683 1305 1082 743 797 682 816 210 675 656 1104
[449] 703 1128 698 713 685 2643 599 2062 586 2301 2804 660 711 653
[463] 734 662 638 713 685 751 712 689 697 713 525 2254 661 661
[477] 502 672 664 679 664 524 629 649 662 803 700 1053 659 1644
[491] 3331 740 692 818 834 3055 651 575 1070 791 583 878 342 671
[505] 4103 2234 2159 607 724 680 718 632 2483 715 629 658 678 1071
[519] 303 780 545 660 684 503 465 613 1593 586 693 528 695 627
[533] 769 429 2965 662 681 1856 688 689 637 719 650 453 768 523
[547] 1197 901 534 690 676 680 836 595 610 670 708 759 708 701
[561] 935 571 690 727 731 607 646 668 767 641 686 681 671 1194
[575] 2491 677 617 749 610 700 641 3168 783 708 721 663 487 518
[589] 637 2115 2347 621 1863 754 626 604 1041 691 721 657 660 1382
[603] 676 553 716 996 722 2579 824 679 1204 471 834 679 640 762
[617] 2194 2042 616 626 2634 286 621 3299 195 699 2150 663 721 677
[631] 665 1071 665 848 759 679 687 2087 656 701 636 1455 922 495
[645] 730 715 666 661 730 3783 828 753 769 3254 681 3244 642 591
[659] 733 696 701 795 508 3058 648 668 3106 740 663 1464 677 3905
[673] 401 3519 336 601 1727 716 652 698 581 796 708 2976 2156 768
[687] 610 501 715 593 678 190 2286 3265 516 669 912 511 676 860
[701] 674 695 1680 731 630 463 2332 461 638 625 2987 692 3869 644
[715] 587 661 811 2426 818 671 587 650 632 626 795 3375 680 610
[729] 665 775 736 1088 1210 497 964 647 721 437 650 656 706 525
[743] 465 634 745 697 674 976 3167 544 914 1978 2248 640 682 2128
[757] 2527 1891 3879 697 874 694 1030 641 673 745 843 736 615 818
[771] 174 3038 813 739 3325 666 790 3102 706 787 480 841 1827 472
[785] 1040 627 3846 2522 586 172 769 711 704 3041 637 644 362 2298
[799] 658 556 806 668 667 653 1370 2591 563 693 508 652 647 564
[813] 2558 659 686 683 716 620 666 561 716 646 4018 679 486 512
[827] 642 585 646 627 653 2934 649 669 215 1542 1332 671 694 2235
[841] 599 781 605 1866 651 723 693 2370 1735 1104 612 200 660 633
[855] 653 711 659 3165 729 588 2600 806 3644 596 553 536 911 715
[869] 493 654 671 684 654 3318 1774 654 652 684 772 658 622 1220
[883] 766 658 2406 1368 749 674 541 722 778 659 725 1316 561 2000
[897] 606 632 754 669 775 374 850 686 472 2545 688 188 1060 738
[911] 681 674 181 628 675 641 2805 678 240 2969 910 647 691 731
[925] 726 680 592 634 1352 697 656 302 458 673 354 911 1055 684
[939] 666 640 631 531 696 563 969 647 779 220 281 669 3105 574
[953] 466 652 634 716 1254 649 824 798 630 722 683 633 702 556
[967] 1563 692 784 674 3076 784 1814 578 619 678 594 675 714 676
[981] 2338 614 689 1353 600 695 655 701 568 3227 737 693 192 515
[995] 644 672 1037 658 592 214 200 3287 694 748 447 655 674 653
[1009] 695 527 2591 622 662 974 721 730 1287 958 739 465 721 678
[1023] 706 696 727 545 730 656 753 2965 2846 680 486 765 678 656
[1037] 990 733 694 879 691 715 564 362 516 2668 2006 505 1128 686
[1051] 661 695 897 656 611 653 624 688 726 2204 659 1941 757 631
[1065] 1555 638 639 526 676 672 618 2015 2194 670 618 684 1447 2214
[1079] 741 685 668 611 846 1255 686 762 152 729 660 696 3003 685
[1093] 627 499 2538 701 649 728 133 983 519 1496 2272 694 506 2853
[1107] 2038 652 646 1955 615 2958 647 586 699 560 993 721 276 693
[1121] 785 1148 693 634 2511 1498 647 3198 3122 724 2991 685 728 679
[1135] 705 616 648 580 594 1459 681 747 1400 3389 684 501 3337 901
[1149] 528 401 682 711 2764 635 2469 738 1150 695 844 638 222 607
[1163] 663 721 677 1358 3578 3462 717 668 749 2913 700 706 397 731
[1177] 1132 495 590 667 652 3020 756 661 688 659 767 653 631 639
[1191] 673 626 750 1406 818 773 534 671 668 3310 983 963 678 721
[1205] 595 806 697 203 3500 542 661 707 597 3293 2462 1363 672 861
[1219] 744 639 1850 683 583 612 229 702 416 651 688 3560 639 2149
[1233] 847 671 699 773 2687 586 814 356 596 2222 1641 561 612 3522
[1247] 722 684 729 658 724 1232 2753 742 2794 985 637 680 731 586
[1261] 554 587 692 657 1013 620 2654 885 1438 738 1008 2070 660 673
[1275] 547 477 336 838 685 720 659 655 697 593 780 2584 746 685
[1289] 559 654 679 745 698 706 645 799 642 1490 727 835 797 3238
[1303] 3109 745 640 3064 521 768 532 656 667 753 3454 686 707 1480
[1317] 650 686 851 2431 793 704 689 634 442 688 629 935 657 2384
[1331] 528 667 665 678 652 663 715 690 1552 626 778 683 1225 698
[1345] 156 355 1244 1512 684 2641 778 683 4504 2915 352 206 3121 739
[1359] 669 3274 2222 1206 2385 671 2424 659 659 737 631 833 1441 643
[1373] 3075 1905 152 1177 634 717 656 319 749 861 706 662 813 632
[1387] 719 2830 431 628 700 650 751 558 704 701 1978 717 614 680
[1401] 654 802 707 659 626 708 639 3671 1221 700 561 686 916 726
[1415] 776 631 3267 1053 633 832 647 671 663 636 545 688 511 2033
[1429] 685 756 302 673 678 703 691 479 666 450 429 805 412 650
[1443] 673 765 741 683 764 654 662 1767 690 755 713 708 839 1426
[1457] 982 676 714 3043 825 3238 1120 665 710 151 602 685 697 556
[1471] 544 574 666 628 602 1765 882 205 1262 667 576 2008 829 590
[1485] 150 486 674 663 676 686 3333 686 3104 795 648 3634 3320 599
[1499] 656 645 852 704 3195 722 658 2112 2907 675 2670 578 563 628
[1513] 569 618 847 1265 1327 703 673 773 670 696 648 713 549 710
[1527] 658 657 729 576 684 724 705 641 717 665 1747 718 686 688
[1541] 1538 692 649 625 1744 648 663 1230 642 654 217 643 2434 779
[1555] 654 755 673 880 592 2362 481 1422 3557 646 1538 633 750 718
[1569] 1130 638 747 639 684 1198 719 708 8 618 621 2091 694 2586
[1583] 4160 570 637 2294 758 776 653 654 650 633 1098 671 659 600
[1597] 567 556 603 1368 554 713 662 770 660 599 670 598 755 2236
[1611] 697 734 1026 669 1433 664 784 211 652 702 607 1116 669 1576
[1625] 650 638 616 622 666 752 762 2746 732 608 642 707 701 1302
[1639] 654 652 2204 815 704 2865 662 177 670 670 1070 693 911 791
[1653] 700 622 449 741 782 785 665 2694 999 685 543 715 553 663
[1667] 586 207 737 625 670 1427 825 694 478 804 693 691 719 606
[1681] 668 702 641 440 714 1545 737 636 621 667 703 2533 676 697
[1695] 2357 3597 691 685 2607 412 3349 2720 569 536 604 669 627 2544
[1709] 685 1689 2915 1992 3850 823 651 2646 698 697 823 676 2180 706
[1723] 609 665 626 685 722 800 671 659 549 799 778 696 579 748
[1737] 696 652 664 704 722 664 1322 665 788 706 476 610 721 675
[1751] 1781 645 737 687 2442 567 1180 650 733 1773 683 167 630 713
[1765] 852 634 646 586 523 925 917 2287 860 592 666 648 232 615
[1779] 610 2513 713 3483 715 648 2734 653 611 1462 660 728 715 3925
[1793] 598 741 692 699 666 704 538 484 660 676 839 628 667 582
[1807] 353 637 736 703 717 690 520 678 795 854 645 2747 529 593
[1821] 683 675 663 2863 682 706 344 725 1001 665 672 682 705 677
[1835] 3671 631 2386 769 689 648 1343 895 695 732 715 661 2624 673
[1849] 1986 602 705 634 716 1011 692 805 500 1249 659 642 622 722
[1863] 944 857 194 990 674 650 635 702 660 728 725 688 1018 657
[1877] 670 721 889 642 566 698 685 698 712 847 192 784 836 715
[1891] 543 3048 668 698 510 3324 2423 558 714 757 646 699 614 727
[1905] 706 938 954 2951 582 773 608 1082 447 676 656 626 654 3646
[1919] 167 2452 732 1014 724 524 473 777 206 880 908 1060 687 2736
[1933] 1336 757 214 409 1447 1248 693 682 686 262 735 488 701 622
[1947] 1191 645 613 659 620 523 684 661 676 708 798 900 710 679
[1961] 182 629 3056 641 3081 686 269 648 1176 654 664 3062 430 571
[1975] 2174 621 436 729 689 685 1108 678 694 4385 699 3186 1047 686
[1989] 546 752 627 827 473 2685 884 767 642 698 1205 668 700 715
[2003] 722 654 706 3878 651 3159 1207 678 866 707 3562 2272 252 664
[2017] 640 603 641 590 623 1054 644 1650 568 733 672 663 560 1389
[2031] 667 701 831 231 671 511 661 2711 651 2676 2583 609 594 1700
[2045] 2837 2462 643 647 2123 1307 665 3042 552 678 715 609 817 555
[2059] 694 605 240 1914 1294 696 728 1247 685 611 664 131 606 1053
[2073] 636 681 714 662 713 773 785 1818 562 592 642 2290 986 663
[2087] 1104 624 741 669 795 703 178 601 778 3072 787 547 589 593
[2101] 607 643 380 708 626 656 902 712 3444
Is this an accurate word count? Let’s examine two sentences from the third video, “The genius of Mendeleev’s periodic table - Lou Serico.” Double-check the word count generated by "\\w+"
:
str_count("A cubic centimeter of it weighs 5.9 grams.", "\\w+")
[1] 9
It returns 9, but it should be 8. This is because the period in “5.9” causes RegEx to treat “5” and “9” as two separate words.
Let’s check another example:
str_count("It's a massive slab of human genius, up there with the Taj Mahal, the Mona Lisa, and the ice cream sandwich -- and the table's creator, Dmitri Mendeleev, is a bonafide science hall-of-famer.", "\\w+")
[1] 36
It returns 36, but it should be 32. This discrepancy occurs because words containing punctuation are not recognized as single words by "\\w+"
. To fix this, we need a more comprehensive pattern. Let’s review the entire transcript. What pattern do you think would capture the correct word count?
$Caption[3] teded
[1] "Translator: tom carter Reviewer: Bedirhan Cinar The periodic table is instantly recognizable. It's not just in every chemistry lab worldwide, it's found on t-shirts, coffee mugs, and shower curtains. But the periodic table isn't just another trendy icon. It's a massive slab of human genius, up there with the Taj Mahal, the Mona Lisa, and the ice cream sandwich -- and the table's creator, Dmitri Mendeleev, is a bonafide science hall-of-famer. But why? What's so great about him and his table? Is it because he made a comprehensive list of the known elements? Nah, you don't earn a spot in science Valhalla just for making a list. Besides, Mendeleev was far from the first person to do that. Is it because Mendeleev arranged elements with similar properties together? Not really, that had already been done too. So what was Mendeleev's genius? Let's look at one of the first versions of the periodic table from around 1870. Here we see elements designated by their two-letter symbols arranged in a table. Check out the entry of the third column, fifth row. There's a dash there. From that unassuming placeholder springs the raw brilliance of Mendeleev. That dash is science. By putting that dash there, Dmitri was making a bold statement. He said -- and I'm paraphrasing here -- Y'all haven't discovered this element yet. In the meantime, I'm going to give it a name. It's one step away from aluminum, so we'll call it eka-aluminum, \"eka\" being Sanskrit for one. Nobody's found eka-aluminum yet, so we don't know anything about it, right? Wrong! Based on where it's located, I can tell you all about it. First of all, an atom of eka-aluminum has an atomic weight of 68, about 68 times heavier than a hydrogen atom. When eka-aluminum is isolated, you'll see it's a solid metal at room temperature. It's shiny, it conducts heat really well, it can be flattened into a sheet, stretched into a wire, but its melting point is low. Like, freakishly low. Oh, and a cubic centimeter of it will weigh six grams. Mendeleev could predict all of these things simply from where the blank spot was, and his understanding of how the elements surrounding it behave. A few years after this prediction, a French guy named Paul Emile Lecoq de Boisbaudran discovered a new element in ore samples and named it gallium after Gaul, the historical name for France. Gallium is one step away from aluminum on the periodic table. It's eka-aluminum. So were Mendeleev's predictions right? Gallium's atomic weight is 69.72. A cubic centimeter of it weighs 5.9 grams. it's a solid metal at room temperature, but it melts at a paltry 30 degrees Celcius, 85 degrees Fahrenheit. It melts in your mouth and in your hand. Not only did Mendeleev completely nail gallium, he predicted other elements that were unknown at the time: scandium, germanium, rhenium. The element he called eka-manganese is now called technetium. Technetium is so rare it couldn't be isolated until it was synthesized in a cyclotron in 1937, almost 70 years after Dmitri predicted its existence, 30 years after he died. Dmitri died without a Nobel Prize in 1907, but he wound up receiving a much more exclusive honor. In 1955, scientists at UC Berkeley successfully created 17 atoms of a previously undiscovered element. This element filled an empty spot in the perodic table at number 101, and was officially named Mendelevium in 1963. There have been well over 800 Nobel Prize winners, but only 15 scientists have an element named after them. So the next time you stare at a periodic table, whether it's on the wall of a university classroom or on a five-dollar coffee mug, Dmitri Mendeleev, the architect of the periodic table, will be staring back."
First, we can include "[a-zA-Z]([a-zA-Z]|\'|-)*"
, which captures words starting with a letter and followed by zero or more letters, apostrophes, or hyphens:
str_count("It's a massive slab of human genius, up there with the Taj Mahal, the Mona Lisa, and the ice cream sandwich -- and the table's creator, Dmitri Mendeleev, is a bonafide science hall-of-famer.", "([a-zA-Z]([a-zA-Z]|\'|-)*)|([0-9]+\\.?[0-9]*)")
[1] 32
To capture numbers with decimals, we can add the pattern "[0-9]+\\.?[0-9]*"
, which matches one or more digits followed by zero or one period and zero or more digits. Since both patterns represent words, we combine them with the "|"
operator (meaning “or”).
str_count("A cubic centimeter of it weighs 5.9 grams.", "([a-zA-Z]([a-zA-Z]|\'|-)*)|([0-9]+\\.?[0-9]*)")
[1] 8
Now we can apply this pattern to count the words in all transcripts:
<- teded %>%
teded mutate(n_word= str_count(Caption, "([a-zA-Z]([a-zA-Z]|\'|-)*)|([0-9]+\\.?[0-9]*)"))
Next, let’s use the affect and moral dictionaries from Brady et al.’s (2017) paper “Emotion shapes the diffusion of moralized content in social networks.” We can load the dictionaries directly from a URL. First, we’ll load the affect dictionary. Since the first line is not a header, we set header = FALSE
:
<- read.delim("https://osf.io/download/k3wnz/", header = FALSE) dict_affect
For each word in the dictionary, the first letter could be either uppercase or lowercase, and we need to create a RegEx pattern to account for both cases. We can extract the first letter of each word using str_sub()
:
str_sub("war", 1, 1)
[1] "w"
To make it uppercase, we use str_to_upper()
:
str_to_upper(str_sub("war", 1, 1))
[1] "W"
To match both cases in RegEx, we use the "|"
operator and group them in parentheses:
str_c("(", str_sub("war", 1, 1), "|", str_to_upper(str_sub("war", 1, 1)), ")")
[1] "(w|W)"
We will use this pattern to modify the first letter in the word:
str_replace("war", str_sub("war", 1, 1), str_c("(", str_sub("war", 1, 1), "|", str_to_upper(str_sub("war", 1, 1)), ")"))
[1] "(w|W)ar"
Some words in the dictionary include a "*"
(e.g., "terribl*"
), which indicates that any word starting with “terribl” should be counted. We need to transform such patterns into RegEx by replacing "*"
with "\\w*"
. Here’s how to do this for "terribl*"
:
str_replace("terribl*", pattern = "\\*", replacement = "\\\\w*")
[1] "terribl\\w*"
For words without "*"
, we add "(?!\\w)"
to prevent partial matches:
str_count("But during the war, the siblings had a terrible argument—a fight so explosive it split the family business in two.", pattern = "war(?!\\w)")
[1] 1
str_count("The weather is warm.", pattern = "war(?!\\w)")
[1] 0
Last, to ensure that we the pattern from the beginning of a word (e.g., matching “ugh” with “enough”), we to make sure a word has a word boundary before it. We can do it by adding word boundaries "\\b"
to the pattern :
str_count("And these minimal group experiments suggested that simply being categorized as part of a group is enough to link that group to a person’s sense of self.", pattern="ugh")
[1] 1
str_count("And these minimal group experiments suggested that simply being categorized as part of a group is enough to link that group to a person’s sense of self.", pattern="\\bugh")
[1] 0
str_c("\\b", "(w|W)ar")
[1] "\\b(w|W)ar"
Now, let’s combine everything. First, we extract the first letter for each word in the dictionary and create a RegEx pattern that matches both cases:
<- dict_affect %>%
dict_affect mutate(init_letter = str_sub(V1, 1, 1),
init_letter_both = str_c("(", init_letter, "|", str_to_upper(init_letter), ")"))
Next, we create a regex
column by modifying the first letter:
<- dict_affect %>% mutate(regex = str_replace(V1, init_letter, init_letter_both)) dict_affect
We detect if a word has "*"
using str_detect()
. Since "*"
is a special character, we use "\\*"
in the pattern. Let’s experiment:
str_detect("abandon*", pattern = "\\*")
[1] TRUE
str_detect("accept", pattern = "\\*")
[1] FALSE
We modify the regex
column based on whether the pattern has "*"
. If str_detect()
returns TRUE
, we replace "*"
with "\\\\w*"
; otherwise, we append "(?!\\w)"
:
<- dict_affect %>% mutate(regex = if_else(str_detect(regex, pattern = "\\*"),
dict_affect str_replace(regex, pattern = "\\*", replacement = "\\\\w*"),
str_c(regex, "(?!\\w)")))
Now we have the RegEx patterns for each word!
$regex dict_affect
[1] "(a|A)bandon\\w*" "(a|A)buse\\w*"
[3] "(a|A)busi\\w*" "(a|A)ccept(?!\\w)"
[5] "(a|A)ccepta\\w*" "(a|A)ccepted(?!\\w)"
[7] "(a|A)ccepting(?!\\w)" "(a|A)ccepts(?!\\w)"
[9] "(a|A)che\\w*" "(a|A)ching(?!\\w)"
[11] "(a|A)ctive\\w*" "(a|A)dmir\\w*"
[13] "(a|A)dor\\w*" "(a|A)dvantag\\w*"
[15] "(a|A)dventur\\w*" "(a|A)dvers\\w*"
[17] "(a|A)ffection\\w*" "(a|A)fraid(?!\\w)"
[19] "(a|A)ggravat\\w*" "(a|A)ggress\\w*"
[21] "(a|A)gitat\\w*" "(a|A)goniz\\w*"
[23] "(a|A)gony(?!\\w)" "(a|A)gree(?!\\w)"
[25] "(a|A)greeab\\w*" "(a|A)greed(?!\\w)"
[27] "(a|A)greeing(?!\\w)" "(a|A)greement\\w*"
[29] "(a|A)grees(?!\\w)" "(a|A)larm\\w*"
[31] "(a|A)lone(?!\\w)" "(a|A)lright\\w*"
[33] "(a|A)maz\\w*" "(a|A)mor\\w*"
[35] "(a|A)mus\\w*" "(a|A)nger\\w*"
[37] "(a|A)ngr\\w*" "(a|A)nguish\\w*"
[39] "(a|A)nnoy\\w*" "(a|A)ntagoni\\w*"
[41] "(a|A)nxi\\w*" "(a|A)ok(?!\\w)"
[43] "(a|A)path\\w*" "(a|A)ppall\\w*"
[45] "(a|A)ppreciat\\w*" "(a|A)pprehens\\w*"
[47] "(a|A)rgh\\w*" "(a|A)rgu\\w*"
[49] "(a|A)rrogan\\w*" "(a|A)sham\\w*"
[51] "(a|A)ssault\\w*" "(a|A)sshole\\w*"
[53] "(a|A)ssur\\w*" "(a|A)ttachment\\w*"
[55] "(a|A)ttack\\w*" "(a|A)ttract\\w*"
[57] "(a|A)versi\\w*" "(a|A)void\\w*"
[59] "(a|A)ward\\w*" "(a|A)wesome(?!\\w)"
[61] "(a|A)wful(?!\\w)" "(a|A)wkward\\w*"
[63] "(b|B)ad(?!\\w)" "(b|B)ashful\\w*"
[65] "(b|B)astard\\w*" "(b|B)attl\\w*"
[67] "(b|B)eaten(?!\\w)" "(b|B)eaut\\w*"
[69] "(b|B)eloved(?!\\w)" "(b|B)enefic\\w*"
[71] "(b|B)enefit(?!\\w)" "(b|B)enefits(?!\\w)"
[73] "(b|B)enefitt\\w*" "(b|B)enevolen\\w*"
[75] "(b|B)enign\\w*" "(b|B)est(?!\\w)"
[77] "(b|B)etter(?!\\w)" "(b|B)itch\\w*"
[79] "(b|B)itter\\w*" "(b|B)lam\\w*"
[81] "(b|B)less\\w*" "(b|B)old\\w*"
[83] "(b|B)onus\\w*" "(b|B)ore\\w*"
[85] "(b|B)oring(?!\\w)" "(b|B)other\\w*"
[87] "(b|B)rave\\w*" "(b|B)right\\w*"
[89] "(b|B)rillian\\w*" "(b|B)roke(?!\\w)"
[91] "(b|B)rutal\\w*" "(b|B)urden\\w*"
[93] "(c|C)alm\\w*" "(c|C)are(?!\\w)"
[95] "(c|C)ared(?!\\w)" "(c|C)arefree(?!\\w)"
[97] "(c|C)areful\\w*" "(c|C)areless\\w*"
[99] "(c|C)ares(?!\\w)" "(c|C)aring(?!\\w)"
[101] "(c|C)asual(?!\\w)" "(c|C)asually(?!\\w)"
[103] "(c|C)ertain\\w*" "(c|C)halleng\\w*"
[105] "(c|C)hamp\\w*" "(c|C)harit\\w*"
[107] "(c|C)harm\\w*" "(c|C)heat\\w*"
[109] "(c|C)heer\\w*" "(c|C)herish\\w*"
[111] "(c|C)huckl\\w*" "(c|C)lever\\w*"
[113] "(c|C)omed\\w*" "(c|C)omfort\\w*"
[115] "(c|C)ommitment\\w*" "(c|C)ompassion\\w*"
[117] "(c|C)omplain\\w*" "(c|C)ompliment\\w*"
[119] "(c|C)oncerned(?!\\w)" "(c|C)onfidence(?!\\w)"
[121] "(c|C)onfident(?!\\w)" "(c|C)onfidently(?!\\w)"
[123] "(c|C)onfront\\w*" "(c|C)onfus\\w*"
[125] "(c|C)onsiderate(?!\\w)" "(c|C)ontempt\\w*"
[127] "(c|C)ontented\\w*" "(c|C)ontentment(?!\\w)"
[129] "(c|C)ontradic\\w*" "(c|C)onvinc\\w*"
[131] "(c|C)ool(?!\\w)" "(c|C)ourag\\w*"
[133] "(c|C)rap(?!\\w)" "(c|C)rappy(?!\\w)"
[135] "(c|C)raz\\w*" "(c|C)reate\\w*"
[137] "(c|C)reati\\w*" "(c|C)redit\\w*"
[139] "(c|C)ried(?!\\w)" "(c|C)ries(?!\\w)"
[141] "(c|C)ritical(?!\\w)" "(c|C)ritici\\w*"
[143] "(c|C)rude\\w*" "(c|C)ruel\\w*"
[145] "(c|C)rushed(?!\\w)" "(c|C)ry(?!\\w)"
[147] "(c|C)rying(?!\\w)" "(c|C)unt\\w*"
[149] "(c|C)ut(?!\\w)" "(c|C)ute\\w*"
[151] "(c|C)utie\\w*" "(c|C)ynic(?!\\w)"
[153] "(d|D)amag\\w*" "(d|D)amn\\w*"
[155] "(d|D)anger\\w*" "(d|D)aring(?!\\w)"
[157] "(d|D)arlin\\w*" "(d|D)aze\\w*"
[159] "(d|D)ear\\w*" "(d|D)ecay\\w*"
[161] "(d|D)efeat\\w*" "(d|D)efect\\w*"
[163] "(d|D)efenc\\w*" "(d|D)efens\\w*"
[165] "(d|D)efinite(?!\\w)" "(d|D)efinitely(?!\\w)"
[167] "(d|D)egrad\\w*" "(d|D)electabl\\w*"
[169] "(d|D)elicate\\w*" "(d|D)elicious\\w*"
[171] "(d|D)eligh\\w*" "(d|D)epress\\w*"
[173] "(d|D)epriv\\w*" "(d|D)espair\\w*"
[175] "(d|D)esperat\\w*" "(d|D)espis\\w*"
[177] "(d|D)estroy\\w*" "(d|D)estruct\\w*"
[179] "(d|D)etermina\\w*" "(d|D)etermined(?!\\w)"
[181] "(d|D)evastat\\w*" "(d|D)evil\\w*"
[183] "(d|D)evot\\w*" "(d|D)ifficult\\w*"
[185] "(d|D)igni\\w*" "(d|D)isadvantage\\w*"
[187] "(d|D)isagree\\w*" "(d|D)isappoint\\w*"
[189] "(d|D)isaster\\w*" "(d|D)iscomfort\\w*"
[191] "(d|D)iscourag\\w*" "(d|D)isgust\\w*"
[193] "(d|D)ishearten\\w*" "(d|D)isillusion\\w*"
[195] "(d|D)islike(?!\\w)" "(d|D)isliked(?!\\w)"
[197] "(d|D)islikes(?!\\w)" "(d|D)isliking(?!\\w)"
[199] "(d|D)ismay\\w*" "(d|D)issatisf\\w*"
[201] "(d|D)istract\\w*" "(d|D)istraught(?!\\w)"
[203] "(d|D)istress\\w*" "(d|D)istrust\\w*"
[205] "(d|D)isturb\\w*" "(d|D)ivin\\w*"
[207] "(d|D)omina\\w*" "(d|D)oom\\w*"
[209] "(d|D)ork\\w*" "(d|D)oubt\\w*"
[211] "(d|D)read\\w*" "(d|D)ull\\w*"
[213] "(d|D)umb\\w*" "(d|D)ump\\w*"
[215] "(d|D)well\\w*" "(d|D)ynam\\w*"
[217] "(e|E)ager\\w*" "(e|E)ase\\w*"
[219] "(e|E)asie\\w*" "(e|E)asily(?!\\w)"
[221] "(e|E)asiness(?!\\w)" "(e|E)asing(?!\\w)"
[223] "(e|E)asy\\w*" "(e|E)csta\\w*"
[225] "(e|E)fficien\\w*" "(e|E)gotis\\w*"
[227] "(e|E)legan\\w*" "(e|E)mbarrass\\w*"
[229] "(e|E)motion(?!\\w)" "(e|E)motion(?!\\w)"
[231] "(e|E)motional(?!\\w)" "(e|E)mpt\\w*"
[233] "(e|E)ncourag\\w*" "(e|E)nemie\\w*"
[235] "(e|E)nemy\\w*" "(e|E)nerg\\w*"
[237] "(e|E)ngag\\w*" "(e|E)njoy\\w*"
[239] "(e|E)nrag\\w*" "(e|E)ntertain\\w*"
[241] "(e|E)nthus\\w*" "(e|E)nvie\\w*"
[243] "(e|E)nvious(?!\\w)" "(e|E)nvy\\w*"
[245] "(e|E)vil\\w*" "(e|E)xcel\\w*"
[247] "(e|E)xcit\\w*" "(e|E)xcruciat\\w*"
[249] "(e|E)xhaust\\w*" "(f|F)ab(?!\\w)"
[251] "(f|F)abulous\\w*" "(f|F)ail\\w*"
[253] "(f|F)aith\\w*" "(f|F)ake(?!\\w)"
[255] "(f|F)antastic\\w*" "(f|F)atal\\w*"
[257] "(f|F)atigu\\w*" "(f|F)ault\\w*"
[259] "(f|F)avor\\w*" "(f|F)avour\\w*"
[261] "(f|F)ear(?!\\w)" "(f|F)eared(?!\\w)"
[263] "(f|F)earful\\w*" "(f|F)earing(?!\\w)"
[265] "(f|F)earless\\w*" "(f|F)ears(?!\\w)"
[267] "(f|F)eroc\\w*" "(f|F)estiv\\w*"
[269] "(f|F)eud\\w*" "(f|F)iery(?!\\w)"
[271] "(f|F)iesta\\w*" "(f|F)ight\\w*"
[273] "(f|F)ine(?!\\w)" "(f|F)ired(?!\\w)"
[275] "(f|F)latter\\w*" "(f|F)lawless\\w*"
[277] "(f|F)lexib\\w*" "(f|F)lirt\\w*"
[279] "(f|F)lunk\\w*" "(f|F)oe\\w*"
[281] "(f|F)ond(?!\\w)" "(f|F)ondly(?!\\w)"
[283] "(f|F)ondness(?!\\w)" "(f|F)ool\\w*"
[285] "(f|F)orbid\\w*" "(f|F)orgave(?!\\w)"
[287] "(f|F)orgiv\\w*" "(f|F)ought(?!\\w)"
[289] "(f|F)rantic\\w*" "(f|F)reak\\w*"
[291] "(f|F)ree(?!\\w)" "(f|F)reeb\\w*"
[293] "(f|F)reed\\w*" "(f|F)reeing(?!\\w)"
[295] "(f|F)reely(?!\\w)" "(f|F)reeness(?!\\w)"
[297] "(f|F)reer(?!\\w)" "(f|F)rees\\w*"
[299] "(f|F)riend\\w*" "(f|F)right\\w*"
[301] "(f|F)rustrat\\w*" "(f|F)uck(?!\\w)"
[303] "(f|F)ucked\\w*" "(f|F)ucker\\w*"
[305] "(f|F)uckin\\w*" "(f|F)ucks(?!\\w)"
[307] "(f|F)ume\\w*" "(f|F)uming(?!\\w)"
[309] "(f|F)un(?!\\w)" "(f|F)unn\\w*"
[311] "(f|F)urious\\w*" "(f|F)ury(?!\\w)"
[313] "(g|G)eek\\w*" "(g|G)enero\\w*"
[315] "(g|G)entle(?!\\w)" "(g|G)entler(?!\\w)"
[317] "(g|G)entlest(?!\\w)" "(g|G)ently(?!\\w)"
[319] "(g|G)iggl\\w*" "(g|G)iver\\w*"
[321] "(g|G)iving(?!\\w)" "(g|G)lad(?!\\w)"
[323] "(g|G)ladly(?!\\w)" "(g|G)lamor\\w*"
[325] "(g|G)lamour\\w*" "(g|G)loom\\w*"
[327] "(g|G)lori\\w*" "(g|G)lory(?!\\w)"
[329] "(g|G)oddam\\w*" "(g|G)ood(?!\\w)"
[331] "(g|G)oodness(?!\\w)" "(g|G)orgeous\\w*"
[333] "(g|G)ossip\\w*" "(g|G)race(?!\\w)"
[335] "(g|G)raced(?!\\w)" "(g|G)raceful\\w*"
[337] "(g|G)races(?!\\w)" "(g|G)raci\\w*"
[339] "(g|G)rand(?!\\w)" "(g|G)rande\\w*"
[341] "(g|G)ratef\\w*" "(g|G)rati\\w*"
[343] "(g|G)rave\\w*" "(g|G)reat(?!\\w)"
[345] "(g|G)reed\\w*" "(g|G)rief(?!\\w)"
[347] "(g|G)riev\\w*" "(g|G)rim\\w*"
[349] "(g|G)rin(?!\\w)" "(g|G)rinn\\w*"
[351] "(g|G)rins(?!\\w)" "(g|G)ross\\w*"
[353] "(g|G)rouch\\w*" "(g|G)rr\\w*"
[355] "(g|G)uilt\\w*" "(h|H)a(?!\\w)"
[357] "(h|H)aha\\w*" "(h|H)andsom\\w*"
[359] "(h|H)appi\\w*" "(h|H)appy(?!\\w)"
[361] "(h|H)arass\\w*" "(h|H)arm(?!\\w)"
[363] "(h|H)armed(?!\\w)" "(h|H)armful\\w*"
[365] "(h|H)arming(?!\\w)" "(h|H)armless\\w*"
[367] "(h|H)armon\\w*" "(h|H)arms(?!\\w)"
[369] "(h|H)ate(?!\\w)" "(h|H)ated(?!\\w)"
[371] "(h|H)ateful\\w*" "(h|H)ater\\w*"
[373] "(h|H)ates(?!\\w)" "(h|H)ating(?!\\w)"
[375] "(h|H)atred(?!\\w)" "(h|H)azy(?!\\w)"
[377] "(h|H)eartbreak\\w*" "(h|H)eartbroke\\w*"
[379] "(h|H)eartfelt(?!\\w)" "(h|H)eartless\\w*"
[381] "(h|H)eartwarm\\w*" "(h|H)eaven\\w*"
[383] "(h|H)eh\\w*" "(h|H)ell(?!\\w)"
[385] "(h|H)ellish(?!\\w)" "(h|H)elper\\w*"
[387] "(h|H)elpful\\w*" "(h|H)elping(?!\\w)"
[389] "(h|H)elpless\\w*" "(h|H)elps(?!\\w)"
[391] "(h|H)ero\\w*" "(h|H)esita\\w*"
[393] "(h|H)ilarious(?!\\w)" "(h|H)oho\\w*"
[395] "(h|H)omesick\\w*" "(h|H)onest\\w*"
[397] "(h|H)onor\\w*" "(h|H)onour\\w*"
[399] "(h|H)ope(?!\\w)" "(h|H)oped(?!\\w)"
[401] "(h|H)opeful(?!\\w)" "(h|H)opefully(?!\\w)"
[403] "(h|H)opefulness(?!\\w)" "(h|H)opeless\\w*"
[405] "(h|H)opes(?!\\w)" "(h|H)oping(?!\\w)"
[407] "(h|H)orr\\w*" "(h|H)ostil\\w*"
[409] "(h|H)ug(?!\\w)" "(h|H)ugg\\w*"
[411] "(h|H)ugs(?!\\w)" "(h|H)umiliat\\w*"
[413] "(h|H)umor\\w*" "(h|H)umour\\w*"
[415] "(h|H)urra\\w*" "(h|H)urt\\w*"
[417] "(i|I)deal\\w*" "(i|I)diot(?!\\w)"
[419] "(i|I)gnor\\w*" "(i|I)mmoral\\w*"
[421] "(i|I)mpatien\\w*" "(i|I)mpersonal(?!\\w)"
[423] "(i|I)mpolite\\w*" "(i|I)mportan\\w*"
[425] "(i|I)mpress\\w*" "(i|I)mprove\\w*"
[427] "(i|I)mproving(?!\\w)" "(i|I)nadequa\\w*"
[429] "(i|I)ncentive\\w*" "(i|I)ndecis\\w*"
[431] "(i|I)neffect\\w*" "(i|I)nferior\\w* "
[433] "(i|I)nhib\\w*" "(i|I)nnocen\\w*"
[435] "(i|I)nsecur\\w*" "(i|I)nsincer\\w*"
[437] "(i|I)nspir\\w*" "(i|I)nsult\\w*"
[439] "(i|I)ntell\\w*" "(i|I)nterest\\w*"
[441] "(i|I)nterrup\\w*" "(i|I)ntimidat\\w*"
[443] "(i|I)nvigor\\w*" "(i|I)rrational\\w*"
[445] "(i|I)rrita\\w*" "(i|I)solat\\w*"
[447] "(j|J)aded(?!\\w)" "(j|J)ealous\\w*"
[449] "(j|J)erk(?!\\w)" "(j|J)erked(?!\\w)"
[451] "(j|J)erks(?!\\w)" "(j|J)oke\\w*"
[453] "(j|J)oking(?!\\w)" "(j|J)oll\\w*"
[455] "(j|J)oy\\w*" "(k|K)een\\w*"
[457] "(k|K)idding(?!\\w)" "(k|K)ill\\w*"
[459] "(k|K)ind(?!\\w)" "(k|K)indly(?!\\w)"
[461] "(k|K)indn\\w*" "(k|K)iss\\w*"
[463] "(l|L)aidback(?!\\w)" "(l|L)ame\\w*"
[465] "(l|L)augh\\w*" "(l|L)azie\\w*"
[467] "(l|L)azy(?!\\w)" "(l|L)iabilit\\w*"
[469] "(l|L)iar\\w*" "(l|L)ibert\\w*"
[471] "(l|L)ied(?!\\w)" "(l|L)ies(?!\\w)"
[473] "(l|L)ike(?!\\w)" "(l|L)ikeab\\w*"
[475] "(l|L)iked(?!\\w)" "(l|L)ikes(?!\\w)"
[477] "(l|L)iking(?!\\w)" "(l|L)ivel\\w*"
[479] "(L|L)MAO(?!\\w)" "(L|L)OL(?!\\w)"
[481] "(l|L)one\\w*" "(l|L)onging\\w*"
[483] "(l|L)ose(?!\\w)" "(l|L)oser\\w*"
[485] "(l|L)oses(?!\\w)" "(l|L)osing(?!\\w)"
[487] "(l|L)oss\\w*" "(l|L)ost(?!\\w)"
[489] "(l|L)ous\\w*" "(l|L)ove(?!\\w)"
[491] "(l|L)oved(?!\\w)" "(l|L)ovely(?!\\w)"
[493] "(l|L)over\\w*" "(l|L)oves(?!\\w)"
[495] "(l|L)oving\\w*" "(l|L)ow\\w*"
[497] "(l|L)oyal\\w*" "(l|L)uck(?!\\w)"
[499] "(l|L)ucked(?!\\w)" "(l|L)ucki\\w*"
[501] "(l|L)uckless\\w*" "(l|L)ucks(?!\\w)"
[503] "(l|L)ucky(?!\\w)" "(l|L)udicrous\\w*"
[505] "(l|L)ying(?!\\w)" "(m|M)ad(?!\\w)"
[507] "(m|M)addening(?!\\w)" "(m|M)adder(?!\\w)"
[509] "(m|M)addest(?!\\w)" "(m|M)adly(?!\\w)"
[511] "(m|M)agnific\\w*" "(m|M)aniac\\w*"
[513] "(m|M)asochis\\w*" "(m|M)elanchol\\w*"
[515] "(m|M)erit\\w*" "(m|M)err\\w*"
[517] "(m|M)ess(?!\\w)" "(m|M)essy(?!\\w)"
[519] "(m|M)iser\\w*" "(m|M)iss(?!\\w)"
[521] "(m|M)issed(?!\\w)" "(m|M)isses(?!\\w)"
[523] "(m|M)issing(?!\\w)" "(m|M)istak\\w*"
[525] "(m|M)ock(?!\\w)" "(m|M)ocked(?!\\w)"
[527] "(m|M)ocker\\w*" "(m|M)ocking(?!\\w)"
[529] "(m|M)ocks(?!\\w)" "(m|M)olest\\w*"
[531] "(m|M)ooch\\w*" "(m|M)ood(?!\\w)"
[533] "(m|M)oodi\\w*" "(m|M)oods(?!\\w)"
[535] "(m|M)oody(?!\\w)" "(m|M)oron\\w*"
[537] "(m|M)ourn\\w*" "(m|M)urder\\w*"
[539] "(n|N)ag\\w*" "(n|N)ast\\w*"
[541] "(n|N)eat\\w*" "(n|N)eedy(?!\\w)"
[543] "(n|N)eglect\\w*" "(n|N)erd\\w*"
[545] "(n|N)ervous\\w*" "(n|N)eurotic\\w*"
[547] "(n|N)ice\\w*" "(n|N)umb\\w*"
[549] "(n|N)urtur\\w*" "(o|O)bnoxious\\w*"
[551] "(o|O)bsess\\w*" "(o|O)ffence\\w*"
[553] "(o|O)ffend\\w*" "(o|O)ffens\\w*"
[555] "(o|O)k(?!\\w)" "(o|O)kay(?!\\w)"
[557] "(o|O)kays(?!\\w)" "(o|O)ks(?!\\w)"
[559] "(o|O)penminded\\w*" "(o|O)penness(?!\\w)"
[561] "(o|O)pportun\\w*" "(o|O)ptimal\\w*"
[563] "(o|O)ptimi\\w*" "(o|O)riginal(?!\\w)"
[565] "(o|O)utgoing(?!\\w)" "(o|O)utrag\\w*"
[567] "(o|O)verwhelm\\w*" "(p|P)ain(?!\\w)"
[569] "(p|P)ained(?!\\w)" "(p|P)ainf\\w*"
[571] "(p|P)aining(?!\\w)" "(p|P)ainl\\w*"
[573] "(p|P)ains(?!\\w)" "(p|P)alatabl\\w*"
[575] "(p|P)anic\\w*" "(p|P)aradise(?!\\w)"
[577] "(p|P)aranoi\\w*" "(p|P)artie\\w*"
[579] "(p|P)arty\\w*" "(p|P)assion\\w*"
[581] "(p|P)athetic\\w*" "(p|P)eace\\w*"
[583] "(p|P)eculiar\\w*" "(p|P)erfect\\w*"
[585] "(p|P)ersonal(?!\\w)" "(p|P)erver\\w*"
[587] "(p|P)essimis\\w*" "(p|P)etrif\\w*"
[589] "(p|P)ettie\\w*" "(p|P)etty\\w*"
[591] "(p|P)hobi\\w*" "(p|P)iss\\w*"
[593] "(p|P)iti\\w*" "(p|P)ity\\w* "
[595] "(p|P)lay(?!\\w)" "(p|P)layed(?!\\w)"
[597] "(p|P)layful\\w*" "(p|P)laying(?!\\w)"
[599] "(p|P)lays(?!\\w)" "(p|P)leasant\\w*"
[601] "(p|P)lease\\w*" "(p|P)leasing(?!\\w)"
[603] "(p|P)leasur\\w*" "(p|P)oison\\w*"
[605] "(p|P)opular\\w*" "(p|P)ositiv\\w*"
[607] "(p|P)rais\\w*" "(p|P)recious\\w*"
[609] "(p|P)rejudic\\w*" "(p|P)ressur\\w*"
[611] "(p|P)rettie\\w*" "(p|P)retty(?!\\w)"
[613] "(p|P)rick\\w*" "(p|P)ride(?!\\w)"
[615] "(p|P)rivileg\\w*" "(p|P)rize\\w*"
[617] "(p|P)roblem\\w*" "(p|P)rofit\\w*"
[619] "(p|P)romis\\w*" "(p|P)rotest(?!\\w)"
[621] "(p|P)rotested(?!\\w)" "(p|P)rotesting(?!\\w)"
[623] "(p|P)roud\\w*" "(p|P)uk\\w*"
[625] "(p|P)unish\\w*" "(r|R)adian\\w*"
[627] "(r|R)age\\w*" "(r|R)aging(?!\\w)"
[629] "(r|R)ancid\\w*" "(r|R)ape\\w*"
[631] "(r|R)aping(?!\\w)" "(r|R)apist\\w*"
[633] "(r|R)eadiness(?!\\w)" "(r|R)eady(?!\\w)"
[635] "(r|R)eassur\\w*" "(r|R)ebel\\w*"
[637] "(r|R)eek\\w*" "(r|R)egret\\w*"
[639] "(r|R)eject\\w*" "(r|R)elax\\w*"
[641] "(r|R)elief(?!\\w)" "(r|R)eliev\\w*"
[643] "(r|R)eluctan\\w*" "(r|R)emorse\\w*"
[645] "(r|R)epress\\w*" "(r|R)esent\\w*"
[647] "(r|R)esign\\w*" "(r|R)esolv\\w*"
[649] "(r|R)espect (?!\\w)" "(r|R)estless\\w*"
[651] "(r|R)evenge\\w*" "(r|R)evigor\\w*"
[653] "(r|R)eward\\w*" "(r|R)ich\\w*"
[655] "(r|R)idicul\\w*" "(r|R)igid\\w*"
[657] "(r|R)isk\\w*" "(R|R)OFL(?!\\w)"
[659] "(r|R)omanc\\w*" "(r|R)omantic\\w*"
[661] "(r|R)otten(?!\\w)" "(r|R)ude\\w*"
[663] "(r|R)uin\\w*" "(s|S)ad(?!\\w)"
[665] "(s|S)adde\\w*" "(s|S)adly(?!\\w)"
[667] "(s|S)adness(?!\\w)" "(s|S)afe\\w*"
[669] "(s|S)arcas\\w*" "(s|S)atisf\\w*"
[671] "(s|S)avage\\w*" "(s|S)ave(?!\\w)"
[673] "(s|S)care\\w*" "(s|S)caring(?!\\w)"
[675] "(s|S)cary(?!\\w)" "(s|S)ceptic\\w*"
[677] "(s|S)cream\\w*" "(s|S)crew\\w*"
[679] "(s|S)ecur\\w*" "(s|S)elfish\\w*"
[681] "(s|S)entimental\\w*" "(s|S)erious(?!\\w)"
[683] "(s|S)eriously(?!\\w)" "(s|S)eriousness(?!\\w)"
[685] "(s|S)evere\\w*" "(s|S)hake\\w*"
[687] "(s|S)haki\\w*" "(s|S)haky(?!\\w)"
[689] "(s|S)hame\\w*" "(s|S)hare(?!\\w)"
[691] "(s|S)hared(?!\\w)" "(s|S)hares(?!\\w)"
[693] "(s|S)haring(?!\\w)" "(s|S)hit\\w*"
[695] "(s|S)hock\\w*" "(s|S)hook(?!\\w)"
[697] "(s|S)hy\\w*" "(s|S)icken\\w*"
[699] "(s|S)igh(?!\\w)" "(s|S)ighed(?!\\w)"
[701] "(s|S)ighing(?!\\w)" "(s|S)ighs(?!\\w)"
[703] "(s|S)illi\\w*" "(s|S)illy(?!\\w)"
[705] "(s|S)in(?!\\w)" "(s|S)incer\\w*"
[707] "(s|S)inister(?!\\w)" "(s|S)ins(?!\\w)"
[709] "(s|S)keptic\\w*" "(s|S)lut\\w*"
[711] "(s|S)mart\\w*" "(s|S)mil\\w*"
[713] "(s|S)mother\\w*" "(s|S)mug\\w*"
[715] "(s|S)nob\\w*" "(s|S)ob(?!\\w)"
[717] "(s|S)obbed(?!\\w)" "(s|S)obbing(?!\\w)"
[719] "(s|S)obs(?!\\w)" "(s|S)ociab\\w*"
[721] "(s|S)olemn\\w*" "(s|S)orrow\\w*"
[723] "(s|S)orry(?!\\w)" "(s|S)oulmate\\w*"
[725] "(s|S)pecial(?!\\w)" "(s|S)pite\\w*"
[727] "(s|S)plend\\w*" "(s|S)tammer\\w*"
[729] "(s|S)tank(?!\\w)" "(s|S)tartl\\w*"
[731] "(s|S)teal\\w*" "(s|S)tench(?!\\w)"
[733] "(s|S)tink\\w*" "(s|S)train\\w*"
[735] "(s|S)trange(?!\\w)" "(s|S)trength\\w*"
[737] "(s|S)tress\\w*" "(s|S)trong\\w*"
[739] "(s|S)truggl\\w*" "(s|S)tubborn\\w*"
[741] "(s|S)tunk(?!\\w)" "(s|S)tunned(?!\\w)"
[743] "(s|S)tuns(?!\\w)" "(s|S)tupid\\w*"
[745] "(s|S)tutter\\w*" "(s|S)ubmissive\\w*"
[747] "(s|S)ucceed\\w*" "(s|S)uccess\\w*"
[749] "(s|S)uck(?!\\w)" "(s|S)ucked(?!\\w)"
[751] "(s|S)ucker\\w*" "(s|S)ucks(?!\\w)"
[753] "(s|S)ucky(?!\\w)" "(s|S)uffer(?!\\w)"
[755] "(s|S)uffered(?!\\w)" "(s|S)ufferer\\w*"
[757] "(s|S)uffering(?!\\w)" "(s|S)uffers(?!\\w)"
[759] "(s|S)unnier(?!\\w)" "(s|S)unniest(?!\\w)"
[761] "(s|S)unny(?!\\w)" "(s|S)unshin\\w*"
[763] "(s|S)uper(?!\\w)" "(s|S)uperior\\w*"
[765] "(s|S)upport(?!\\w)" "(s|S)upported(?!\\w)"
[767] "(s|S)upporter\\w*" "(s|S)upporting(?!\\w)"
[769] "(s|S)upportive\\w*" "(s|S)upports(?!\\w)"
[771] "(s|S)uprem\\w*" "(s|S)ure\\w*"
[773] "(s|S)urpris\\w*" "(s|S)uspicio\\w*"
[775] "(s|S)weet(?!\\w)" "(s|S)weetheart\\w*"
[777] "(s|S)weetie\\w*" "(s|S)weetly(?!\\w)"
[779] "(s|S)weetness\\w*" "(s|S)weets(?!\\w)"
[781] "(t|T)alent\\w*" "(t|T)antrum\\w*"
[783] "(t|T)ears(?!\\w)" "(t|T)eas\\w*"
[785] "(t|T)ehe(?!\\w)" "(t|T)emper(?!\\w)"
[787] "(t|T)empers(?!\\w)" "(t|T)ender\\w*"
[789] "(t|T)ense\\w*" "(t|T)ensing(?!\\w)"
[791] "(t|T)ension\\w*" "(t|T)erribl\\w*"
[793] "(t|T)errific\\w*" "(t|T)errified(?!\\w)"
[795] "(t|T)errifies(?!\\w)" "(t|T)errify (?!\\w)"
[797] "(t|T)errifying(?!\\w)" "(t|T)error\\w*"
[799] "(t|T)hank(?!\\w)" "(t|T)hanked(?!\\w)"
[801] "(t|T)hankf\\w*" "(t|T)hanks(?!\\w)"
[803] "(t|T)hief(?!\\w)" "(t|T)hieve\\w*"
[805] "(t|T)houghtful\\w*" "(t|T)hreat\\w*"
[807] "(t|T)hrill\\w*" "(t|T)icked(?!\\w)"
[809] "(t|T)imid\\w*" "(t|T)oleran\\w*"
[811] "(t|T)ortur\\w*" "(t|T)ough\\w*"
[813] "(t|T)raged\\w*" "(t|T)ragic\\w* "
[815] "(t|T)ranquil\\w*" "(t|T)rauma\\w*"
[817] "(t|T)reasur\\w*" "(t|T)reat(?!\\w)"
[819] "(t|T)rembl\\w*" "(t|T)rick\\w*"
[821] "(t|T)rite(?!\\w)" "(t|T)riumph\\w*"
[823] "(t|T)rivi\\w*" "(t|T)roubl\\w*"
[825] "(t|T)rue (?!\\w)" "(t|T)rueness(?!\\w)"
[827] "(t|T)ruer(?!\\w)" "(t|T)ruest(?!\\w)"
[829] "(t|T)ruly(?!\\w)" "(t|T)rust\\w*"
[831] "(t|T)ruth\\w*" "(t|T)urmoil(?!\\w)"
[833] "(u|U)gh(?!\\w)" "(u|U)gl\\w*"
[835] "(u|U)nattractive(?!\\w)" "(u|U)ncertain\\w*"
[837] "(u|U)ncomfortabl\\w*" "(u|U)ncontrol\\w*"
[839] "(u|U)neas\\w*" "(u|U)nfortunate\\w*"
[841] "(u|U)nfriendly(?!\\w)" "(u|U)ngrateful\\w*"
[843] "(u|U)nhapp\\w*" "(u|U)nimportant(?!\\w)"
[845] "(u|U)nimpress\\w*" "(u|U)nkind(?!\\w)"
[847] "(u|U)nlov\\w*" "(u|U)npleasant(?!\\w)"
[849] "(u|U)nprotected(?!\\w)" "(u|U)nsavo\\w*"
[851] "(u|U)nsuccessful\\w*" "(u|U)nsure\\w*"
[853] "(u|U)nwelcom\\w*" "(u|U)pset\\w*"
[855] "(u|U)ptight\\w*" "(u|U)seful\\w*"
[857] "(u|U)seless\\w* " "(v|V)ain(?!\\w)"
[859] "(v|V)aluabl\\w*" "(v|V)alue(?!\\w)"
[861] "(v|V)alued(?!\\w)" "(v|V)alues(?!\\w)"
[863] "(v|V)aluing(?!\\w)" "(v|V)anity(?!\\w)"
[865] "(v|V)icious\\w*" "(v|V)ictim\\w*"
[867] "(v|V)igor\\w*" "(v|V)igour\\w*"
[869] "(v|V)ile(?!\\w)" "(v|V)illain\\w*"
[871] "(v|V)iolat\\w*" "(v|V)iolent\\w*"
[873] "(v|V)irtue\\w*" "(v|V)irtuo\\w*"
[875] "(v|V)ital\\w*" "(v|V)ulnerab\\w*"
[877] "(v|V)ulture\\w*" "(w|W)ar(?!\\w)"
[879] "(w|W)arfare\\w*" "(w|W)arm\\w*"
[881] "(w|W)arred(?!\\w)" "(w|W)arring(?!\\w)"
[883] "(w|W)ars(?!\\w)" "(w|W)eak\\w*"
[885] "(w|W)ealth\\w*" "(w|W)eapon\\w*"
[887] "(w|W)eep\\w*" "(w|W)eird\\w*"
[889] "(w|W)elcom\\w*" "(w|W)ell\\w*"
[891] "(w|W)ept(?!\\w)" "(w|W)hine\\w*"
[893] "(w|W)hining(?!\\w)" "(w|W)hore\\w*"
[895] "(w|W)icked\\w*" "(w|W)illing(?!\\w)"
[897] "(w|W)imp\\w*" "(w|W)in(?!\\w)"
[899] "(w|W)inn\\w*" "(w|W)ins(?!\\w)"
[901] "(w|W)isdom(?!\\w)" "(w|W)ise\\w*"
[903] "(w|W)itch(?!\\w)" "(w|W)oe\\w*"
[905] "(w|W)on(?!\\w)" "(w|W)onderf\\w*"
[907] "(w|W)orr\\w*" "(w|W)orse\\w*"
[909] "(w|W)orship\\w*" "(w|W)orst(?!\\w)"
[911] "(w|W)orthless\\w* " "(w|W)orthwhile(?!\\w)"
[913] "(w|W)ow\\w*" "(w|W)rong\\w*"
[915] "(y|Y)ay(?!\\w)" "(y|Y)ays(?!\\w)"
[917] "(y|Y)earn\\w*"
Now, let’s combine the patterns using str_flatten()
, which merges a vector of strings into a single string. To separate them with "|"
, we specify collapse = "|"
:
str_flatten(dict_affect$regex,collapse="|")
[1] "(a|A)bandon\\w*|(a|A)buse\\w*|(a|A)busi\\w*|(a|A)ccept(?!\\w)|(a|A)ccepta\\w*|(a|A)ccepted(?!\\w)|(a|A)ccepting(?!\\w)|(a|A)ccepts(?!\\w)|(a|A)che\\w*|(a|A)ching(?!\\w)|(a|A)ctive\\w*|(a|A)dmir\\w*|(a|A)dor\\w*|(a|A)dvantag\\w*|(a|A)dventur\\w*|(a|A)dvers\\w*|(a|A)ffection\\w*|(a|A)fraid(?!\\w)|(a|A)ggravat\\w*|(a|A)ggress\\w*|(a|A)gitat\\w*|(a|A)goniz\\w*|(a|A)gony(?!\\w)|(a|A)gree(?!\\w)|(a|A)greeab\\w*|(a|A)greed(?!\\w)|(a|A)greeing(?!\\w)|(a|A)greement\\w*|(a|A)grees(?!\\w)|(a|A)larm\\w*|(a|A)lone(?!\\w)|(a|A)lright\\w*|(a|A)maz\\w*|(a|A)mor\\w*|(a|A)mus\\w*|(a|A)nger\\w*|(a|A)ngr\\w*|(a|A)nguish\\w*|(a|A)nnoy\\w*|(a|A)ntagoni\\w*|(a|A)nxi\\w*|(a|A)ok(?!\\w)|(a|A)path\\w*|(a|A)ppall\\w*|(a|A)ppreciat\\w*|(a|A)pprehens\\w*|(a|A)rgh\\w*|(a|A)rgu\\w*|(a|A)rrogan\\w*|(a|A)sham\\w*|(a|A)ssault\\w*|(a|A)sshole\\w*|(a|A)ssur\\w*|(a|A)ttachment\\w*|(a|A)ttack\\w*|(a|A)ttract\\w*|(a|A)versi\\w*|(a|A)void\\w*|(a|A)ward\\w*|(a|A)wesome(?!\\w)|(a|A)wful(?!\\w)|(a|A)wkward\\w*|(b|B)ad(?!\\w)|(b|B)ashful\\w*|(b|B)astard\\w*|(b|B)attl\\w*|(b|B)eaten(?!\\w)|(b|B)eaut\\w*|(b|B)eloved(?!\\w)|(b|B)enefic\\w*|(b|B)enefit(?!\\w)|(b|B)enefits(?!\\w)|(b|B)enefitt\\w*|(b|B)enevolen\\w*|(b|B)enign\\w*|(b|B)est(?!\\w)|(b|B)etter(?!\\w)|(b|B)itch\\w*|(b|B)itter\\w*|(b|B)lam\\w*|(b|B)less\\w*|(b|B)old\\w*|(b|B)onus\\w*|(b|B)ore\\w*|(b|B)oring(?!\\w)|(b|B)other\\w*|(b|B)rave\\w*|(b|B)right\\w*|(b|B)rillian\\w*|(b|B)roke(?!\\w)|(b|B)rutal\\w*|(b|B)urden\\w*|(c|C)alm\\w*|(c|C)are(?!\\w)|(c|C)ared(?!\\w)|(c|C)arefree(?!\\w)|(c|C)areful\\w*|(c|C)areless\\w*|(c|C)ares(?!\\w)|(c|C)aring(?!\\w)|(c|C)asual(?!\\w)|(c|C)asually(?!\\w)|(c|C)ertain\\w*|(c|C)halleng\\w*|(c|C)hamp\\w*|(c|C)harit\\w*|(c|C)harm\\w*|(c|C)heat\\w*|(c|C)heer\\w*|(c|C)herish\\w*|(c|C)huckl\\w*|(c|C)lever\\w*|(c|C)omed\\w*|(c|C)omfort\\w*|(c|C)ommitment\\w*|(c|C)ompassion\\w*|(c|C)omplain\\w*|(c|C)ompliment\\w*|(c|C)oncerned(?!\\w)|(c|C)onfidence(?!\\w)|(c|C)onfident(?!\\w)|(c|C)onfidently(?!\\w)|(c|C)onfront\\w*|(c|C)onfus\\w*|(c|C)onsiderate(?!\\w)|(c|C)ontempt\\w*|(c|C)ontented\\w*|(c|C)ontentment(?!\\w)|(c|C)ontradic\\w*|(c|C)onvinc\\w*|(c|C)ool(?!\\w)|(c|C)ourag\\w*|(c|C)rap(?!\\w)|(c|C)rappy(?!\\w)|(c|C)raz\\w*|(c|C)reate\\w*|(c|C)reati\\w*|(c|C)redit\\w*|(c|C)ried(?!\\w)|(c|C)ries(?!\\w)|(c|C)ritical(?!\\w)|(c|C)ritici\\w*|(c|C)rude\\w*|(c|C)ruel\\w*|(c|C)rushed(?!\\w)|(c|C)ry(?!\\w)|(c|C)rying(?!\\w)|(c|C)unt\\w*|(c|C)ut(?!\\w)|(c|C)ute\\w*|(c|C)utie\\w*|(c|C)ynic(?!\\w)|(d|D)amag\\w*|(d|D)amn\\w*|(d|D)anger\\w*|(d|D)aring(?!\\w)|(d|D)arlin\\w*|(d|D)aze\\w*|(d|D)ear\\w*|(d|D)ecay\\w*|(d|D)efeat\\w*|(d|D)efect\\w*|(d|D)efenc\\w*|(d|D)efens\\w*|(d|D)efinite(?!\\w)|(d|D)efinitely(?!\\w)|(d|D)egrad\\w*|(d|D)electabl\\w*|(d|D)elicate\\w*|(d|D)elicious\\w*|(d|D)eligh\\w*|(d|D)epress\\w*|(d|D)epriv\\w*|(d|D)espair\\w*|(d|D)esperat\\w*|(d|D)espis\\w*|(d|D)estroy\\w*|(d|D)estruct\\w*|(d|D)etermina\\w*|(d|D)etermined(?!\\w)|(d|D)evastat\\w*|(d|D)evil\\w*|(d|D)evot\\w*|(d|D)ifficult\\w*|(d|D)igni\\w*|(d|D)isadvantage\\w*|(d|D)isagree\\w*|(d|D)isappoint\\w*|(d|D)isaster\\w*|(d|D)iscomfort\\w*|(d|D)iscourag\\w*|(d|D)isgust\\w*|(d|D)ishearten\\w*|(d|D)isillusion\\w*|(d|D)islike(?!\\w)|(d|D)isliked(?!\\w)|(d|D)islikes(?!\\w)|(d|D)isliking(?!\\w)|(d|D)ismay\\w*|(d|D)issatisf\\w*|(d|D)istract\\w*|(d|D)istraught(?!\\w)|(d|D)istress\\w*|(d|D)istrust\\w*|(d|D)isturb\\w*|(d|D)ivin\\w*|(d|D)omina\\w*|(d|D)oom\\w*|(d|D)ork\\w*|(d|D)oubt\\w*|(d|D)read\\w*|(d|D)ull\\w*|(d|D)umb\\w*|(d|D)ump\\w*|(d|D)well\\w*|(d|D)ynam\\w*|(e|E)ager\\w*|(e|E)ase\\w*|(e|E)asie\\w*|(e|E)asily(?!\\w)|(e|E)asiness(?!\\w)|(e|E)asing(?!\\w)|(e|E)asy\\w*|(e|E)csta\\w*|(e|E)fficien\\w*|(e|E)gotis\\w*|(e|E)legan\\w*|(e|E)mbarrass\\w*|(e|E)motion(?!\\w)|(e|E)motion(?!\\w)|(e|E)motional(?!\\w)|(e|E)mpt\\w*|(e|E)ncourag\\w*|(e|E)nemie\\w*|(e|E)nemy\\w*|(e|E)nerg\\w*|(e|E)ngag\\w*|(e|E)njoy\\w*|(e|E)nrag\\w*|(e|E)ntertain\\w*|(e|E)nthus\\w*|(e|E)nvie\\w*|(e|E)nvious(?!\\w)|(e|E)nvy\\w*|(e|E)vil\\w*|(e|E)xcel\\w*|(e|E)xcit\\w*|(e|E)xcruciat\\w*|(e|E)xhaust\\w*|(f|F)ab(?!\\w)|(f|F)abulous\\w*|(f|F)ail\\w*|(f|F)aith\\w*|(f|F)ake(?!\\w)|(f|F)antastic\\w*|(f|F)atal\\w*|(f|F)atigu\\w*|(f|F)ault\\w*|(f|F)avor\\w*|(f|F)avour\\w*|(f|F)ear(?!\\w)|(f|F)eared(?!\\w)|(f|F)earful\\w*|(f|F)earing(?!\\w)|(f|F)earless\\w*|(f|F)ears(?!\\w)|(f|F)eroc\\w*|(f|F)estiv\\w*|(f|F)eud\\w*|(f|F)iery(?!\\w)|(f|F)iesta\\w*|(f|F)ight\\w*|(f|F)ine(?!\\w)|(f|F)ired(?!\\w)|(f|F)latter\\w*|(f|F)lawless\\w*|(f|F)lexib\\w*|(f|F)lirt\\w*|(f|F)lunk\\w*|(f|F)oe\\w*|(f|F)ond(?!\\w)|(f|F)ondly(?!\\w)|(f|F)ondness(?!\\w)|(f|F)ool\\w*|(f|F)orbid\\w*|(f|F)orgave(?!\\w)|(f|F)orgiv\\w*|(f|F)ought(?!\\w)|(f|F)rantic\\w*|(f|F)reak\\w*|(f|F)ree(?!\\w)|(f|F)reeb\\w*|(f|F)reed\\w*|(f|F)reeing(?!\\w)|(f|F)reely(?!\\w)|(f|F)reeness(?!\\w)|(f|F)reer(?!\\w)|(f|F)rees\\w*|(f|F)riend\\w*|(f|F)right\\w*|(f|F)rustrat\\w*|(f|F)uck(?!\\w)|(f|F)ucked\\w*|(f|F)ucker\\w*|(f|F)uckin\\w*|(f|F)ucks(?!\\w)|(f|F)ume\\w*|(f|F)uming(?!\\w)|(f|F)un(?!\\w)|(f|F)unn\\w*|(f|F)urious\\w*|(f|F)ury(?!\\w)|(g|G)eek\\w*|(g|G)enero\\w*|(g|G)entle(?!\\w)|(g|G)entler(?!\\w)|(g|G)entlest(?!\\w)|(g|G)ently(?!\\w)|(g|G)iggl\\w*|(g|G)iver\\w*|(g|G)iving(?!\\w)|(g|G)lad(?!\\w)|(g|G)ladly(?!\\w)|(g|G)lamor\\w*|(g|G)lamour\\w*|(g|G)loom\\w*|(g|G)lori\\w*|(g|G)lory(?!\\w)|(g|G)oddam\\w*|(g|G)ood(?!\\w)|(g|G)oodness(?!\\w)|(g|G)orgeous\\w*|(g|G)ossip\\w*|(g|G)race(?!\\w)|(g|G)raced(?!\\w)|(g|G)raceful\\w*|(g|G)races(?!\\w)|(g|G)raci\\w*|(g|G)rand(?!\\w)|(g|G)rande\\w*|(g|G)ratef\\w*|(g|G)rati\\w*|(g|G)rave\\w*|(g|G)reat(?!\\w)|(g|G)reed\\w*|(g|G)rief(?!\\w)|(g|G)riev\\w*|(g|G)rim\\w*|(g|G)rin(?!\\w)|(g|G)rinn\\w*|(g|G)rins(?!\\w)|(g|G)ross\\w*|(g|G)rouch\\w*|(g|G)rr\\w*|(g|G)uilt\\w*|(h|H)a(?!\\w)|(h|H)aha\\w*|(h|H)andsom\\w*|(h|H)appi\\w*|(h|H)appy(?!\\w)|(h|H)arass\\w*|(h|H)arm(?!\\w)|(h|H)armed(?!\\w)|(h|H)armful\\w*|(h|H)arming(?!\\w)|(h|H)armless\\w*|(h|H)armon\\w*|(h|H)arms(?!\\w)|(h|H)ate(?!\\w)|(h|H)ated(?!\\w)|(h|H)ateful\\w*|(h|H)ater\\w*|(h|H)ates(?!\\w)|(h|H)ating(?!\\w)|(h|H)atred(?!\\w)|(h|H)azy(?!\\w)|(h|H)eartbreak\\w*|(h|H)eartbroke\\w*|(h|H)eartfelt(?!\\w)|(h|H)eartless\\w*|(h|H)eartwarm\\w*|(h|H)eaven\\w*|(h|H)eh\\w*|(h|H)ell(?!\\w)|(h|H)ellish(?!\\w)|(h|H)elper\\w*|(h|H)elpful\\w*|(h|H)elping(?!\\w)|(h|H)elpless\\w*|(h|H)elps(?!\\w)|(h|H)ero\\w*|(h|H)esita\\w*|(h|H)ilarious(?!\\w)|(h|H)oho\\w*|(h|H)omesick\\w*|(h|H)onest\\w*|(h|H)onor\\w*|(h|H)onour\\w*|(h|H)ope(?!\\w)|(h|H)oped(?!\\w)|(h|H)opeful(?!\\w)|(h|H)opefully(?!\\w)|(h|H)opefulness(?!\\w)|(h|H)opeless\\w*|(h|H)opes(?!\\w)|(h|H)oping(?!\\w)|(h|H)orr\\w*|(h|H)ostil\\w*|(h|H)ug(?!\\w)|(h|H)ugg\\w*|(h|H)ugs(?!\\w)|(h|H)umiliat\\w*|(h|H)umor\\w*|(h|H)umour\\w*|(h|H)urra\\w*|(h|H)urt\\w*|(i|I)deal\\w*|(i|I)diot(?!\\w)|(i|I)gnor\\w*|(i|I)mmoral\\w*|(i|I)mpatien\\w*|(i|I)mpersonal(?!\\w)|(i|I)mpolite\\w*|(i|I)mportan\\w*|(i|I)mpress\\w*|(i|I)mprove\\w*|(i|I)mproving(?!\\w)|(i|I)nadequa\\w*|(i|I)ncentive\\w*|(i|I)ndecis\\w*|(i|I)neffect\\w*|(i|I)nferior\\w* |(i|I)nhib\\w*|(i|I)nnocen\\w*|(i|I)nsecur\\w*|(i|I)nsincer\\w*|(i|I)nspir\\w*|(i|I)nsult\\w*|(i|I)ntell\\w*|(i|I)nterest\\w*|(i|I)nterrup\\w*|(i|I)ntimidat\\w*|(i|I)nvigor\\w*|(i|I)rrational\\w*|(i|I)rrita\\w*|(i|I)solat\\w*|(j|J)aded(?!\\w)|(j|J)ealous\\w*|(j|J)erk(?!\\w)|(j|J)erked(?!\\w)|(j|J)erks(?!\\w)|(j|J)oke\\w*|(j|J)oking(?!\\w)|(j|J)oll\\w*|(j|J)oy\\w*|(k|K)een\\w*|(k|K)idding(?!\\w)|(k|K)ill\\w*|(k|K)ind(?!\\w)|(k|K)indly(?!\\w)|(k|K)indn\\w*|(k|K)iss\\w*|(l|L)aidback(?!\\w)|(l|L)ame\\w*|(l|L)augh\\w*|(l|L)azie\\w*|(l|L)azy(?!\\w)|(l|L)iabilit\\w*|(l|L)iar\\w*|(l|L)ibert\\w*|(l|L)ied(?!\\w)|(l|L)ies(?!\\w)|(l|L)ike(?!\\w)|(l|L)ikeab\\w*|(l|L)iked(?!\\w)|(l|L)ikes(?!\\w)|(l|L)iking(?!\\w)|(l|L)ivel\\w*|(L|L)MAO(?!\\w)|(L|L)OL(?!\\w)|(l|L)one\\w*|(l|L)onging\\w*|(l|L)ose(?!\\w)|(l|L)oser\\w*|(l|L)oses(?!\\w)|(l|L)osing(?!\\w)|(l|L)oss\\w*|(l|L)ost(?!\\w)|(l|L)ous\\w*|(l|L)ove(?!\\w)|(l|L)oved(?!\\w)|(l|L)ovely(?!\\w)|(l|L)over\\w*|(l|L)oves(?!\\w)|(l|L)oving\\w*|(l|L)ow\\w*|(l|L)oyal\\w*|(l|L)uck(?!\\w)|(l|L)ucked(?!\\w)|(l|L)ucki\\w*|(l|L)uckless\\w*|(l|L)ucks(?!\\w)|(l|L)ucky(?!\\w)|(l|L)udicrous\\w*|(l|L)ying(?!\\w)|(m|M)ad(?!\\w)|(m|M)addening(?!\\w)|(m|M)adder(?!\\w)|(m|M)addest(?!\\w)|(m|M)adly(?!\\w)|(m|M)agnific\\w*|(m|M)aniac\\w*|(m|M)asochis\\w*|(m|M)elanchol\\w*|(m|M)erit\\w*|(m|M)err\\w*|(m|M)ess(?!\\w)|(m|M)essy(?!\\w)|(m|M)iser\\w*|(m|M)iss(?!\\w)|(m|M)issed(?!\\w)|(m|M)isses(?!\\w)|(m|M)issing(?!\\w)|(m|M)istak\\w*|(m|M)ock(?!\\w)|(m|M)ocked(?!\\w)|(m|M)ocker\\w*|(m|M)ocking(?!\\w)|(m|M)ocks(?!\\w)|(m|M)olest\\w*|(m|M)ooch\\w*|(m|M)ood(?!\\w)|(m|M)oodi\\w*|(m|M)oods(?!\\w)|(m|M)oody(?!\\w)|(m|M)oron\\w*|(m|M)ourn\\w*|(m|M)urder\\w*|(n|N)ag\\w*|(n|N)ast\\w*|(n|N)eat\\w*|(n|N)eedy(?!\\w)|(n|N)eglect\\w*|(n|N)erd\\w*|(n|N)ervous\\w*|(n|N)eurotic\\w*|(n|N)ice\\w*|(n|N)umb\\w*|(n|N)urtur\\w*|(o|O)bnoxious\\w*|(o|O)bsess\\w*|(o|O)ffence\\w*|(o|O)ffend\\w*|(o|O)ffens\\w*|(o|O)k(?!\\w)|(o|O)kay(?!\\w)|(o|O)kays(?!\\w)|(o|O)ks(?!\\w)|(o|O)penminded\\w*|(o|O)penness(?!\\w)|(o|O)pportun\\w*|(o|O)ptimal\\w*|(o|O)ptimi\\w*|(o|O)riginal(?!\\w)|(o|O)utgoing(?!\\w)|(o|O)utrag\\w*|(o|O)verwhelm\\w*|(p|P)ain(?!\\w)|(p|P)ained(?!\\w)|(p|P)ainf\\w*|(p|P)aining(?!\\w)|(p|P)ainl\\w*|(p|P)ains(?!\\w)|(p|P)alatabl\\w*|(p|P)anic\\w*|(p|P)aradise(?!\\w)|(p|P)aranoi\\w*|(p|P)artie\\w*|(p|P)arty\\w*|(p|P)assion\\w*|(p|P)athetic\\w*|(p|P)eace\\w*|(p|P)eculiar\\w*|(p|P)erfect\\w*|(p|P)ersonal(?!\\w)|(p|P)erver\\w*|(p|P)essimis\\w*|(p|P)etrif\\w*|(p|P)ettie\\w*|(p|P)etty\\w*|(p|P)hobi\\w*|(p|P)iss\\w*|(p|P)iti\\w*|(p|P)ity\\w* |(p|P)lay(?!\\w)|(p|P)layed(?!\\w)|(p|P)layful\\w*|(p|P)laying(?!\\w)|(p|P)lays(?!\\w)|(p|P)leasant\\w*|(p|P)lease\\w*|(p|P)leasing(?!\\w)|(p|P)leasur\\w*|(p|P)oison\\w*|(p|P)opular\\w*|(p|P)ositiv\\w*|(p|P)rais\\w*|(p|P)recious\\w*|(p|P)rejudic\\w*|(p|P)ressur\\w*|(p|P)rettie\\w*|(p|P)retty(?!\\w)|(p|P)rick\\w*|(p|P)ride(?!\\w)|(p|P)rivileg\\w*|(p|P)rize\\w*|(p|P)roblem\\w*|(p|P)rofit\\w*|(p|P)romis\\w*|(p|P)rotest(?!\\w)|(p|P)rotested(?!\\w)|(p|P)rotesting(?!\\w)|(p|P)roud\\w*|(p|P)uk\\w*|(p|P)unish\\w*|(r|R)adian\\w*|(r|R)age\\w*|(r|R)aging(?!\\w)|(r|R)ancid\\w*|(r|R)ape\\w*|(r|R)aping(?!\\w)|(r|R)apist\\w*|(r|R)eadiness(?!\\w)|(r|R)eady(?!\\w)|(r|R)eassur\\w*|(r|R)ebel\\w*|(r|R)eek\\w*|(r|R)egret\\w*|(r|R)eject\\w*|(r|R)elax\\w*|(r|R)elief(?!\\w)|(r|R)eliev\\w*|(r|R)eluctan\\w*|(r|R)emorse\\w*|(r|R)epress\\w*|(r|R)esent\\w*|(r|R)esign\\w*|(r|R)esolv\\w*|(r|R)espect (?!\\w)|(r|R)estless\\w*|(r|R)evenge\\w*|(r|R)evigor\\w*|(r|R)eward\\w*|(r|R)ich\\w*|(r|R)idicul\\w*|(r|R)igid\\w*|(r|R)isk\\w*|(R|R)OFL(?!\\w)|(r|R)omanc\\w*|(r|R)omantic\\w*|(r|R)otten(?!\\w)|(r|R)ude\\w*|(r|R)uin\\w*|(s|S)ad(?!\\w)|(s|S)adde\\w*|(s|S)adly(?!\\w)|(s|S)adness(?!\\w)|(s|S)afe\\w*|(s|S)arcas\\w*|(s|S)atisf\\w*|(s|S)avage\\w*|(s|S)ave(?!\\w)|(s|S)care\\w*|(s|S)caring(?!\\w)|(s|S)cary(?!\\w)|(s|S)ceptic\\w*|(s|S)cream\\w*|(s|S)crew\\w*|(s|S)ecur\\w*|(s|S)elfish\\w*|(s|S)entimental\\w*|(s|S)erious(?!\\w)|(s|S)eriously(?!\\w)|(s|S)eriousness(?!\\w)|(s|S)evere\\w*|(s|S)hake\\w*|(s|S)haki\\w*|(s|S)haky(?!\\w)|(s|S)hame\\w*|(s|S)hare(?!\\w)|(s|S)hared(?!\\w)|(s|S)hares(?!\\w)|(s|S)haring(?!\\w)|(s|S)hit\\w*|(s|S)hock\\w*|(s|S)hook(?!\\w)|(s|S)hy\\w*|(s|S)icken\\w*|(s|S)igh(?!\\w)|(s|S)ighed(?!\\w)|(s|S)ighing(?!\\w)|(s|S)ighs(?!\\w)|(s|S)illi\\w*|(s|S)illy(?!\\w)|(s|S)in(?!\\w)|(s|S)incer\\w*|(s|S)inister(?!\\w)|(s|S)ins(?!\\w)|(s|S)keptic\\w*|(s|S)lut\\w*|(s|S)mart\\w*|(s|S)mil\\w*|(s|S)mother\\w*|(s|S)mug\\w*|(s|S)nob\\w*|(s|S)ob(?!\\w)|(s|S)obbed(?!\\w)|(s|S)obbing(?!\\w)|(s|S)obs(?!\\w)|(s|S)ociab\\w*|(s|S)olemn\\w*|(s|S)orrow\\w*|(s|S)orry(?!\\w)|(s|S)oulmate\\w*|(s|S)pecial(?!\\w)|(s|S)pite\\w*|(s|S)plend\\w*|(s|S)tammer\\w*|(s|S)tank(?!\\w)|(s|S)tartl\\w*|(s|S)teal\\w*|(s|S)tench(?!\\w)|(s|S)tink\\w*|(s|S)train\\w*|(s|S)trange(?!\\w)|(s|S)trength\\w*|(s|S)tress\\w*|(s|S)trong\\w*|(s|S)truggl\\w*|(s|S)tubborn\\w*|(s|S)tunk(?!\\w)|(s|S)tunned(?!\\w)|(s|S)tuns(?!\\w)|(s|S)tupid\\w*|(s|S)tutter\\w*|(s|S)ubmissive\\w*|(s|S)ucceed\\w*|(s|S)uccess\\w*|(s|S)uck(?!\\w)|(s|S)ucked(?!\\w)|(s|S)ucker\\w*|(s|S)ucks(?!\\w)|(s|S)ucky(?!\\w)|(s|S)uffer(?!\\w)|(s|S)uffered(?!\\w)|(s|S)ufferer\\w*|(s|S)uffering(?!\\w)|(s|S)uffers(?!\\w)|(s|S)unnier(?!\\w)|(s|S)unniest(?!\\w)|(s|S)unny(?!\\w)|(s|S)unshin\\w*|(s|S)uper(?!\\w)|(s|S)uperior\\w*|(s|S)upport(?!\\w)|(s|S)upported(?!\\w)|(s|S)upporter\\w*|(s|S)upporting(?!\\w)|(s|S)upportive\\w*|(s|S)upports(?!\\w)|(s|S)uprem\\w*|(s|S)ure\\w*|(s|S)urpris\\w*|(s|S)uspicio\\w*|(s|S)weet(?!\\w)|(s|S)weetheart\\w*|(s|S)weetie\\w*|(s|S)weetly(?!\\w)|(s|S)weetness\\w*|(s|S)weets(?!\\w)|(t|T)alent\\w*|(t|T)antrum\\w*|(t|T)ears(?!\\w)|(t|T)eas\\w*|(t|T)ehe(?!\\w)|(t|T)emper(?!\\w)|(t|T)empers(?!\\w)|(t|T)ender\\w*|(t|T)ense\\w*|(t|T)ensing(?!\\w)|(t|T)ension\\w*|(t|T)erribl\\w*|(t|T)errific\\w*|(t|T)errified(?!\\w)|(t|T)errifies(?!\\w)|(t|T)errify (?!\\w)|(t|T)errifying(?!\\w)|(t|T)error\\w*|(t|T)hank(?!\\w)|(t|T)hanked(?!\\w)|(t|T)hankf\\w*|(t|T)hanks(?!\\w)|(t|T)hief(?!\\w)|(t|T)hieve\\w*|(t|T)houghtful\\w*|(t|T)hreat\\w*|(t|T)hrill\\w*|(t|T)icked(?!\\w)|(t|T)imid\\w*|(t|T)oleran\\w*|(t|T)ortur\\w*|(t|T)ough\\w*|(t|T)raged\\w*|(t|T)ragic\\w* |(t|T)ranquil\\w*|(t|T)rauma\\w*|(t|T)reasur\\w*|(t|T)reat(?!\\w)|(t|T)rembl\\w*|(t|T)rick\\w*|(t|T)rite(?!\\w)|(t|T)riumph\\w*|(t|T)rivi\\w*|(t|T)roubl\\w*|(t|T)rue (?!\\w)|(t|T)rueness(?!\\w)|(t|T)ruer(?!\\w)|(t|T)ruest(?!\\w)|(t|T)ruly(?!\\w)|(t|T)rust\\w*|(t|T)ruth\\w*|(t|T)urmoil(?!\\w)|(u|U)gh(?!\\w)|(u|U)gl\\w*|(u|U)nattractive(?!\\w)|(u|U)ncertain\\w*|(u|U)ncomfortabl\\w*|(u|U)ncontrol\\w*|(u|U)neas\\w*|(u|U)nfortunate\\w*|(u|U)nfriendly(?!\\w)|(u|U)ngrateful\\w*|(u|U)nhapp\\w*|(u|U)nimportant(?!\\w)|(u|U)nimpress\\w*|(u|U)nkind(?!\\w)|(u|U)nlov\\w*|(u|U)npleasant(?!\\w)|(u|U)nprotected(?!\\w)|(u|U)nsavo\\w*|(u|U)nsuccessful\\w*|(u|U)nsure\\w*|(u|U)nwelcom\\w*|(u|U)pset\\w*|(u|U)ptight\\w*|(u|U)seful\\w*|(u|U)seless\\w* |(v|V)ain(?!\\w)|(v|V)aluabl\\w*|(v|V)alue(?!\\w)|(v|V)alued(?!\\w)|(v|V)alues(?!\\w)|(v|V)aluing(?!\\w)|(v|V)anity(?!\\w)|(v|V)icious\\w*|(v|V)ictim\\w*|(v|V)igor\\w*|(v|V)igour\\w*|(v|V)ile(?!\\w)|(v|V)illain\\w*|(v|V)iolat\\w*|(v|V)iolent\\w*|(v|V)irtue\\w*|(v|V)irtuo\\w*|(v|V)ital\\w*|(v|V)ulnerab\\w*|(v|V)ulture\\w*|(w|W)ar(?!\\w)|(w|W)arfare\\w*|(w|W)arm\\w*|(w|W)arred(?!\\w)|(w|W)arring(?!\\w)|(w|W)ars(?!\\w)|(w|W)eak\\w*|(w|W)ealth\\w*|(w|W)eapon\\w*|(w|W)eep\\w*|(w|W)eird\\w*|(w|W)elcom\\w*|(w|W)ell\\w*|(w|W)ept(?!\\w)|(w|W)hine\\w*|(w|W)hining(?!\\w)|(w|W)hore\\w*|(w|W)icked\\w*|(w|W)illing(?!\\w)|(w|W)imp\\w*|(w|W)in(?!\\w)|(w|W)inn\\w*|(w|W)ins(?!\\w)|(w|W)isdom(?!\\w)|(w|W)ise\\w*|(w|W)itch(?!\\w)|(w|W)oe\\w*|(w|W)on(?!\\w)|(w|W)onderf\\w*|(w|W)orr\\w*|(w|W)orse\\w*|(w|W)orship\\w*|(w|W)orst(?!\\w)|(w|W)orthless\\w* |(w|W)orthwhile(?!\\w)|(w|W)ow\\w*|(w|W)rong\\w*|(y|Y)ay(?!\\w)|(y|Y)ays(?!\\w)|(y|Y)earn\\w*"
Lastly, because we want to match any one of the words, we group them with parentheses. Furthermore, we want these patterns are from the start of , and add "\\b"
to ensure the match starts at the beginning of a word:
str_c("\\b","(",str_flatten(dict_affect$regex,collapse="|"), ")")
[1] "\\b((a|A)bandon\\w*|(a|A)buse\\w*|(a|A)busi\\w*|(a|A)ccept(?!\\w)|(a|A)ccepta\\w*|(a|A)ccepted(?!\\w)|(a|A)ccepting(?!\\w)|(a|A)ccepts(?!\\w)|(a|A)che\\w*|(a|A)ching(?!\\w)|(a|A)ctive\\w*|(a|A)dmir\\w*|(a|A)dor\\w*|(a|A)dvantag\\w*|(a|A)dventur\\w*|(a|A)dvers\\w*|(a|A)ffection\\w*|(a|A)fraid(?!\\w)|(a|A)ggravat\\w*|(a|A)ggress\\w*|(a|A)gitat\\w*|(a|A)goniz\\w*|(a|A)gony(?!\\w)|(a|A)gree(?!\\w)|(a|A)greeab\\w*|(a|A)greed(?!\\w)|(a|A)greeing(?!\\w)|(a|A)greement\\w*|(a|A)grees(?!\\w)|(a|A)larm\\w*|(a|A)lone(?!\\w)|(a|A)lright\\w*|(a|A)maz\\w*|(a|A)mor\\w*|(a|A)mus\\w*|(a|A)nger\\w*|(a|A)ngr\\w*|(a|A)nguish\\w*|(a|A)nnoy\\w*|(a|A)ntagoni\\w*|(a|A)nxi\\w*|(a|A)ok(?!\\w)|(a|A)path\\w*|(a|A)ppall\\w*|(a|A)ppreciat\\w*|(a|A)pprehens\\w*|(a|A)rgh\\w*|(a|A)rgu\\w*|(a|A)rrogan\\w*|(a|A)sham\\w*|(a|A)ssault\\w*|(a|A)sshole\\w*|(a|A)ssur\\w*|(a|A)ttachment\\w*|(a|A)ttack\\w*|(a|A)ttract\\w*|(a|A)versi\\w*|(a|A)void\\w*|(a|A)ward\\w*|(a|A)wesome(?!\\w)|(a|A)wful(?!\\w)|(a|A)wkward\\w*|(b|B)ad(?!\\w)|(b|B)ashful\\w*|(b|B)astard\\w*|(b|B)attl\\w*|(b|B)eaten(?!\\w)|(b|B)eaut\\w*|(b|B)eloved(?!\\w)|(b|B)enefic\\w*|(b|B)enefit(?!\\w)|(b|B)enefits(?!\\w)|(b|B)enefitt\\w*|(b|B)enevolen\\w*|(b|B)enign\\w*|(b|B)est(?!\\w)|(b|B)etter(?!\\w)|(b|B)itch\\w*|(b|B)itter\\w*|(b|B)lam\\w*|(b|B)less\\w*|(b|B)old\\w*|(b|B)onus\\w*|(b|B)ore\\w*|(b|B)oring(?!\\w)|(b|B)other\\w*|(b|B)rave\\w*|(b|B)right\\w*|(b|B)rillian\\w*|(b|B)roke(?!\\w)|(b|B)rutal\\w*|(b|B)urden\\w*|(c|C)alm\\w*|(c|C)are(?!\\w)|(c|C)ared(?!\\w)|(c|C)arefree(?!\\w)|(c|C)areful\\w*|(c|C)areless\\w*|(c|C)ares(?!\\w)|(c|C)aring(?!\\w)|(c|C)asual(?!\\w)|(c|C)asually(?!\\w)|(c|C)ertain\\w*|(c|C)halleng\\w*|(c|C)hamp\\w*|(c|C)harit\\w*|(c|C)harm\\w*|(c|C)heat\\w*|(c|C)heer\\w*|(c|C)herish\\w*|(c|C)huckl\\w*|(c|C)lever\\w*|(c|C)omed\\w*|(c|C)omfort\\w*|(c|C)ommitment\\w*|(c|C)ompassion\\w*|(c|C)omplain\\w*|(c|C)ompliment\\w*|(c|C)oncerned(?!\\w)|(c|C)onfidence(?!\\w)|(c|C)onfident(?!\\w)|(c|C)onfidently(?!\\w)|(c|C)onfront\\w*|(c|C)onfus\\w*|(c|C)onsiderate(?!\\w)|(c|C)ontempt\\w*|(c|C)ontented\\w*|(c|C)ontentment(?!\\w)|(c|C)ontradic\\w*|(c|C)onvinc\\w*|(c|C)ool(?!\\w)|(c|C)ourag\\w*|(c|C)rap(?!\\w)|(c|C)rappy(?!\\w)|(c|C)raz\\w*|(c|C)reate\\w*|(c|C)reati\\w*|(c|C)redit\\w*|(c|C)ried(?!\\w)|(c|C)ries(?!\\w)|(c|C)ritical(?!\\w)|(c|C)ritici\\w*|(c|C)rude\\w*|(c|C)ruel\\w*|(c|C)rushed(?!\\w)|(c|C)ry(?!\\w)|(c|C)rying(?!\\w)|(c|C)unt\\w*|(c|C)ut(?!\\w)|(c|C)ute\\w*|(c|C)utie\\w*|(c|C)ynic(?!\\w)|(d|D)amag\\w*|(d|D)amn\\w*|(d|D)anger\\w*|(d|D)aring(?!\\w)|(d|D)arlin\\w*|(d|D)aze\\w*|(d|D)ear\\w*|(d|D)ecay\\w*|(d|D)efeat\\w*|(d|D)efect\\w*|(d|D)efenc\\w*|(d|D)efens\\w*|(d|D)efinite(?!\\w)|(d|D)efinitely(?!\\w)|(d|D)egrad\\w*|(d|D)electabl\\w*|(d|D)elicate\\w*|(d|D)elicious\\w*|(d|D)eligh\\w*|(d|D)epress\\w*|(d|D)epriv\\w*|(d|D)espair\\w*|(d|D)esperat\\w*|(d|D)espis\\w*|(d|D)estroy\\w*|(d|D)estruct\\w*|(d|D)etermina\\w*|(d|D)etermined(?!\\w)|(d|D)evastat\\w*|(d|D)evil\\w*|(d|D)evot\\w*|(d|D)ifficult\\w*|(d|D)igni\\w*|(d|D)isadvantage\\w*|(d|D)isagree\\w*|(d|D)isappoint\\w*|(d|D)isaster\\w*|(d|D)iscomfort\\w*|(d|D)iscourag\\w*|(d|D)isgust\\w*|(d|D)ishearten\\w*|(d|D)isillusion\\w*|(d|D)islike(?!\\w)|(d|D)isliked(?!\\w)|(d|D)islikes(?!\\w)|(d|D)isliking(?!\\w)|(d|D)ismay\\w*|(d|D)issatisf\\w*|(d|D)istract\\w*|(d|D)istraught(?!\\w)|(d|D)istress\\w*|(d|D)istrust\\w*|(d|D)isturb\\w*|(d|D)ivin\\w*|(d|D)omina\\w*|(d|D)oom\\w*|(d|D)ork\\w*|(d|D)oubt\\w*|(d|D)read\\w*|(d|D)ull\\w*|(d|D)umb\\w*|(d|D)ump\\w*|(d|D)well\\w*|(d|D)ynam\\w*|(e|E)ager\\w*|(e|E)ase\\w*|(e|E)asie\\w*|(e|E)asily(?!\\w)|(e|E)asiness(?!\\w)|(e|E)asing(?!\\w)|(e|E)asy\\w*|(e|E)csta\\w*|(e|E)fficien\\w*|(e|E)gotis\\w*|(e|E)legan\\w*|(e|E)mbarrass\\w*|(e|E)motion(?!\\w)|(e|E)motion(?!\\w)|(e|E)motional(?!\\w)|(e|E)mpt\\w*|(e|E)ncourag\\w*|(e|E)nemie\\w*|(e|E)nemy\\w*|(e|E)nerg\\w*|(e|E)ngag\\w*|(e|E)njoy\\w*|(e|E)nrag\\w*|(e|E)ntertain\\w*|(e|E)nthus\\w*|(e|E)nvie\\w*|(e|E)nvious(?!\\w)|(e|E)nvy\\w*|(e|E)vil\\w*|(e|E)xcel\\w*|(e|E)xcit\\w*|(e|E)xcruciat\\w*|(e|E)xhaust\\w*|(f|F)ab(?!\\w)|(f|F)abulous\\w*|(f|F)ail\\w*|(f|F)aith\\w*|(f|F)ake(?!\\w)|(f|F)antastic\\w*|(f|F)atal\\w*|(f|F)atigu\\w*|(f|F)ault\\w*|(f|F)avor\\w*|(f|F)avour\\w*|(f|F)ear(?!\\w)|(f|F)eared(?!\\w)|(f|F)earful\\w*|(f|F)earing(?!\\w)|(f|F)earless\\w*|(f|F)ears(?!\\w)|(f|F)eroc\\w*|(f|F)estiv\\w*|(f|F)eud\\w*|(f|F)iery(?!\\w)|(f|F)iesta\\w*|(f|F)ight\\w*|(f|F)ine(?!\\w)|(f|F)ired(?!\\w)|(f|F)latter\\w*|(f|F)lawless\\w*|(f|F)lexib\\w*|(f|F)lirt\\w*|(f|F)lunk\\w*|(f|F)oe\\w*|(f|F)ond(?!\\w)|(f|F)ondly(?!\\w)|(f|F)ondness(?!\\w)|(f|F)ool\\w*|(f|F)orbid\\w*|(f|F)orgave(?!\\w)|(f|F)orgiv\\w*|(f|F)ought(?!\\w)|(f|F)rantic\\w*|(f|F)reak\\w*|(f|F)ree(?!\\w)|(f|F)reeb\\w*|(f|F)reed\\w*|(f|F)reeing(?!\\w)|(f|F)reely(?!\\w)|(f|F)reeness(?!\\w)|(f|F)reer(?!\\w)|(f|F)rees\\w*|(f|F)riend\\w*|(f|F)right\\w*|(f|F)rustrat\\w*|(f|F)uck(?!\\w)|(f|F)ucked\\w*|(f|F)ucker\\w*|(f|F)uckin\\w*|(f|F)ucks(?!\\w)|(f|F)ume\\w*|(f|F)uming(?!\\w)|(f|F)un(?!\\w)|(f|F)unn\\w*|(f|F)urious\\w*|(f|F)ury(?!\\w)|(g|G)eek\\w*|(g|G)enero\\w*|(g|G)entle(?!\\w)|(g|G)entler(?!\\w)|(g|G)entlest(?!\\w)|(g|G)ently(?!\\w)|(g|G)iggl\\w*|(g|G)iver\\w*|(g|G)iving(?!\\w)|(g|G)lad(?!\\w)|(g|G)ladly(?!\\w)|(g|G)lamor\\w*|(g|G)lamour\\w*|(g|G)loom\\w*|(g|G)lori\\w*|(g|G)lory(?!\\w)|(g|G)oddam\\w*|(g|G)ood(?!\\w)|(g|G)oodness(?!\\w)|(g|G)orgeous\\w*|(g|G)ossip\\w*|(g|G)race(?!\\w)|(g|G)raced(?!\\w)|(g|G)raceful\\w*|(g|G)races(?!\\w)|(g|G)raci\\w*|(g|G)rand(?!\\w)|(g|G)rande\\w*|(g|G)ratef\\w*|(g|G)rati\\w*|(g|G)rave\\w*|(g|G)reat(?!\\w)|(g|G)reed\\w*|(g|G)rief(?!\\w)|(g|G)riev\\w*|(g|G)rim\\w*|(g|G)rin(?!\\w)|(g|G)rinn\\w*|(g|G)rins(?!\\w)|(g|G)ross\\w*|(g|G)rouch\\w*|(g|G)rr\\w*|(g|G)uilt\\w*|(h|H)a(?!\\w)|(h|H)aha\\w*|(h|H)andsom\\w*|(h|H)appi\\w*|(h|H)appy(?!\\w)|(h|H)arass\\w*|(h|H)arm(?!\\w)|(h|H)armed(?!\\w)|(h|H)armful\\w*|(h|H)arming(?!\\w)|(h|H)armless\\w*|(h|H)armon\\w*|(h|H)arms(?!\\w)|(h|H)ate(?!\\w)|(h|H)ated(?!\\w)|(h|H)ateful\\w*|(h|H)ater\\w*|(h|H)ates(?!\\w)|(h|H)ating(?!\\w)|(h|H)atred(?!\\w)|(h|H)azy(?!\\w)|(h|H)eartbreak\\w*|(h|H)eartbroke\\w*|(h|H)eartfelt(?!\\w)|(h|H)eartless\\w*|(h|H)eartwarm\\w*|(h|H)eaven\\w*|(h|H)eh\\w*|(h|H)ell(?!\\w)|(h|H)ellish(?!\\w)|(h|H)elper\\w*|(h|H)elpful\\w*|(h|H)elping(?!\\w)|(h|H)elpless\\w*|(h|H)elps(?!\\w)|(h|H)ero\\w*|(h|H)esita\\w*|(h|H)ilarious(?!\\w)|(h|H)oho\\w*|(h|H)omesick\\w*|(h|H)onest\\w*|(h|H)onor\\w*|(h|H)onour\\w*|(h|H)ope(?!\\w)|(h|H)oped(?!\\w)|(h|H)opeful(?!\\w)|(h|H)opefully(?!\\w)|(h|H)opefulness(?!\\w)|(h|H)opeless\\w*|(h|H)opes(?!\\w)|(h|H)oping(?!\\w)|(h|H)orr\\w*|(h|H)ostil\\w*|(h|H)ug(?!\\w)|(h|H)ugg\\w*|(h|H)ugs(?!\\w)|(h|H)umiliat\\w*|(h|H)umor\\w*|(h|H)umour\\w*|(h|H)urra\\w*|(h|H)urt\\w*|(i|I)deal\\w*|(i|I)diot(?!\\w)|(i|I)gnor\\w*|(i|I)mmoral\\w*|(i|I)mpatien\\w*|(i|I)mpersonal(?!\\w)|(i|I)mpolite\\w*|(i|I)mportan\\w*|(i|I)mpress\\w*|(i|I)mprove\\w*|(i|I)mproving(?!\\w)|(i|I)nadequa\\w*|(i|I)ncentive\\w*|(i|I)ndecis\\w*|(i|I)neffect\\w*|(i|I)nferior\\w* |(i|I)nhib\\w*|(i|I)nnocen\\w*|(i|I)nsecur\\w*|(i|I)nsincer\\w*|(i|I)nspir\\w*|(i|I)nsult\\w*|(i|I)ntell\\w*|(i|I)nterest\\w*|(i|I)nterrup\\w*|(i|I)ntimidat\\w*|(i|I)nvigor\\w*|(i|I)rrational\\w*|(i|I)rrita\\w*|(i|I)solat\\w*|(j|J)aded(?!\\w)|(j|J)ealous\\w*|(j|J)erk(?!\\w)|(j|J)erked(?!\\w)|(j|J)erks(?!\\w)|(j|J)oke\\w*|(j|J)oking(?!\\w)|(j|J)oll\\w*|(j|J)oy\\w*|(k|K)een\\w*|(k|K)idding(?!\\w)|(k|K)ill\\w*|(k|K)ind(?!\\w)|(k|K)indly(?!\\w)|(k|K)indn\\w*|(k|K)iss\\w*|(l|L)aidback(?!\\w)|(l|L)ame\\w*|(l|L)augh\\w*|(l|L)azie\\w*|(l|L)azy(?!\\w)|(l|L)iabilit\\w*|(l|L)iar\\w*|(l|L)ibert\\w*|(l|L)ied(?!\\w)|(l|L)ies(?!\\w)|(l|L)ike(?!\\w)|(l|L)ikeab\\w*|(l|L)iked(?!\\w)|(l|L)ikes(?!\\w)|(l|L)iking(?!\\w)|(l|L)ivel\\w*|(L|L)MAO(?!\\w)|(L|L)OL(?!\\w)|(l|L)one\\w*|(l|L)onging\\w*|(l|L)ose(?!\\w)|(l|L)oser\\w*|(l|L)oses(?!\\w)|(l|L)osing(?!\\w)|(l|L)oss\\w*|(l|L)ost(?!\\w)|(l|L)ous\\w*|(l|L)ove(?!\\w)|(l|L)oved(?!\\w)|(l|L)ovely(?!\\w)|(l|L)over\\w*|(l|L)oves(?!\\w)|(l|L)oving\\w*|(l|L)ow\\w*|(l|L)oyal\\w*|(l|L)uck(?!\\w)|(l|L)ucked(?!\\w)|(l|L)ucki\\w*|(l|L)uckless\\w*|(l|L)ucks(?!\\w)|(l|L)ucky(?!\\w)|(l|L)udicrous\\w*|(l|L)ying(?!\\w)|(m|M)ad(?!\\w)|(m|M)addening(?!\\w)|(m|M)adder(?!\\w)|(m|M)addest(?!\\w)|(m|M)adly(?!\\w)|(m|M)agnific\\w*|(m|M)aniac\\w*|(m|M)asochis\\w*|(m|M)elanchol\\w*|(m|M)erit\\w*|(m|M)err\\w*|(m|M)ess(?!\\w)|(m|M)essy(?!\\w)|(m|M)iser\\w*|(m|M)iss(?!\\w)|(m|M)issed(?!\\w)|(m|M)isses(?!\\w)|(m|M)issing(?!\\w)|(m|M)istak\\w*|(m|M)ock(?!\\w)|(m|M)ocked(?!\\w)|(m|M)ocker\\w*|(m|M)ocking(?!\\w)|(m|M)ocks(?!\\w)|(m|M)olest\\w*|(m|M)ooch\\w*|(m|M)ood(?!\\w)|(m|M)oodi\\w*|(m|M)oods(?!\\w)|(m|M)oody(?!\\w)|(m|M)oron\\w*|(m|M)ourn\\w*|(m|M)urder\\w*|(n|N)ag\\w*|(n|N)ast\\w*|(n|N)eat\\w*|(n|N)eedy(?!\\w)|(n|N)eglect\\w*|(n|N)erd\\w*|(n|N)ervous\\w*|(n|N)eurotic\\w*|(n|N)ice\\w*|(n|N)umb\\w*|(n|N)urtur\\w*|(o|O)bnoxious\\w*|(o|O)bsess\\w*|(o|O)ffence\\w*|(o|O)ffend\\w*|(o|O)ffens\\w*|(o|O)k(?!\\w)|(o|O)kay(?!\\w)|(o|O)kays(?!\\w)|(o|O)ks(?!\\w)|(o|O)penminded\\w*|(o|O)penness(?!\\w)|(o|O)pportun\\w*|(o|O)ptimal\\w*|(o|O)ptimi\\w*|(o|O)riginal(?!\\w)|(o|O)utgoing(?!\\w)|(o|O)utrag\\w*|(o|O)verwhelm\\w*|(p|P)ain(?!\\w)|(p|P)ained(?!\\w)|(p|P)ainf\\w*|(p|P)aining(?!\\w)|(p|P)ainl\\w*|(p|P)ains(?!\\w)|(p|P)alatabl\\w*|(p|P)anic\\w*|(p|P)aradise(?!\\w)|(p|P)aranoi\\w*|(p|P)artie\\w*|(p|P)arty\\w*|(p|P)assion\\w*|(p|P)athetic\\w*|(p|P)eace\\w*|(p|P)eculiar\\w*|(p|P)erfect\\w*|(p|P)ersonal(?!\\w)|(p|P)erver\\w*|(p|P)essimis\\w*|(p|P)etrif\\w*|(p|P)ettie\\w*|(p|P)etty\\w*|(p|P)hobi\\w*|(p|P)iss\\w*|(p|P)iti\\w*|(p|P)ity\\w* |(p|P)lay(?!\\w)|(p|P)layed(?!\\w)|(p|P)layful\\w*|(p|P)laying(?!\\w)|(p|P)lays(?!\\w)|(p|P)leasant\\w*|(p|P)lease\\w*|(p|P)leasing(?!\\w)|(p|P)leasur\\w*|(p|P)oison\\w*|(p|P)opular\\w*|(p|P)ositiv\\w*|(p|P)rais\\w*|(p|P)recious\\w*|(p|P)rejudic\\w*|(p|P)ressur\\w*|(p|P)rettie\\w*|(p|P)retty(?!\\w)|(p|P)rick\\w*|(p|P)ride(?!\\w)|(p|P)rivileg\\w*|(p|P)rize\\w*|(p|P)roblem\\w*|(p|P)rofit\\w*|(p|P)romis\\w*|(p|P)rotest(?!\\w)|(p|P)rotested(?!\\w)|(p|P)rotesting(?!\\w)|(p|P)roud\\w*|(p|P)uk\\w*|(p|P)unish\\w*|(r|R)adian\\w*|(r|R)age\\w*|(r|R)aging(?!\\w)|(r|R)ancid\\w*|(r|R)ape\\w*|(r|R)aping(?!\\w)|(r|R)apist\\w*|(r|R)eadiness(?!\\w)|(r|R)eady(?!\\w)|(r|R)eassur\\w*|(r|R)ebel\\w*|(r|R)eek\\w*|(r|R)egret\\w*|(r|R)eject\\w*|(r|R)elax\\w*|(r|R)elief(?!\\w)|(r|R)eliev\\w*|(r|R)eluctan\\w*|(r|R)emorse\\w*|(r|R)epress\\w*|(r|R)esent\\w*|(r|R)esign\\w*|(r|R)esolv\\w*|(r|R)espect (?!\\w)|(r|R)estless\\w*|(r|R)evenge\\w*|(r|R)evigor\\w*|(r|R)eward\\w*|(r|R)ich\\w*|(r|R)idicul\\w*|(r|R)igid\\w*|(r|R)isk\\w*|(R|R)OFL(?!\\w)|(r|R)omanc\\w*|(r|R)omantic\\w*|(r|R)otten(?!\\w)|(r|R)ude\\w*|(r|R)uin\\w*|(s|S)ad(?!\\w)|(s|S)adde\\w*|(s|S)adly(?!\\w)|(s|S)adness(?!\\w)|(s|S)afe\\w*|(s|S)arcas\\w*|(s|S)atisf\\w*|(s|S)avage\\w*|(s|S)ave(?!\\w)|(s|S)care\\w*|(s|S)caring(?!\\w)|(s|S)cary(?!\\w)|(s|S)ceptic\\w*|(s|S)cream\\w*|(s|S)crew\\w*|(s|S)ecur\\w*|(s|S)elfish\\w*|(s|S)entimental\\w*|(s|S)erious(?!\\w)|(s|S)eriously(?!\\w)|(s|S)eriousness(?!\\w)|(s|S)evere\\w*|(s|S)hake\\w*|(s|S)haki\\w*|(s|S)haky(?!\\w)|(s|S)hame\\w*|(s|S)hare(?!\\w)|(s|S)hared(?!\\w)|(s|S)hares(?!\\w)|(s|S)haring(?!\\w)|(s|S)hit\\w*|(s|S)hock\\w*|(s|S)hook(?!\\w)|(s|S)hy\\w*|(s|S)icken\\w*|(s|S)igh(?!\\w)|(s|S)ighed(?!\\w)|(s|S)ighing(?!\\w)|(s|S)ighs(?!\\w)|(s|S)illi\\w*|(s|S)illy(?!\\w)|(s|S)in(?!\\w)|(s|S)incer\\w*|(s|S)inister(?!\\w)|(s|S)ins(?!\\w)|(s|S)keptic\\w*|(s|S)lut\\w*|(s|S)mart\\w*|(s|S)mil\\w*|(s|S)mother\\w*|(s|S)mug\\w*|(s|S)nob\\w*|(s|S)ob(?!\\w)|(s|S)obbed(?!\\w)|(s|S)obbing(?!\\w)|(s|S)obs(?!\\w)|(s|S)ociab\\w*|(s|S)olemn\\w*|(s|S)orrow\\w*|(s|S)orry(?!\\w)|(s|S)oulmate\\w*|(s|S)pecial(?!\\w)|(s|S)pite\\w*|(s|S)plend\\w*|(s|S)tammer\\w*|(s|S)tank(?!\\w)|(s|S)tartl\\w*|(s|S)teal\\w*|(s|S)tench(?!\\w)|(s|S)tink\\w*|(s|S)train\\w*|(s|S)trange(?!\\w)|(s|S)trength\\w*|(s|S)tress\\w*|(s|S)trong\\w*|(s|S)truggl\\w*|(s|S)tubborn\\w*|(s|S)tunk(?!\\w)|(s|S)tunned(?!\\w)|(s|S)tuns(?!\\w)|(s|S)tupid\\w*|(s|S)tutter\\w*|(s|S)ubmissive\\w*|(s|S)ucceed\\w*|(s|S)uccess\\w*|(s|S)uck(?!\\w)|(s|S)ucked(?!\\w)|(s|S)ucker\\w*|(s|S)ucks(?!\\w)|(s|S)ucky(?!\\w)|(s|S)uffer(?!\\w)|(s|S)uffered(?!\\w)|(s|S)ufferer\\w*|(s|S)uffering(?!\\w)|(s|S)uffers(?!\\w)|(s|S)unnier(?!\\w)|(s|S)unniest(?!\\w)|(s|S)unny(?!\\w)|(s|S)unshin\\w*|(s|S)uper(?!\\w)|(s|S)uperior\\w*|(s|S)upport(?!\\w)|(s|S)upported(?!\\w)|(s|S)upporter\\w*|(s|S)upporting(?!\\w)|(s|S)upportive\\w*|(s|S)upports(?!\\w)|(s|S)uprem\\w*|(s|S)ure\\w*|(s|S)urpris\\w*|(s|S)uspicio\\w*|(s|S)weet(?!\\w)|(s|S)weetheart\\w*|(s|S)weetie\\w*|(s|S)weetly(?!\\w)|(s|S)weetness\\w*|(s|S)weets(?!\\w)|(t|T)alent\\w*|(t|T)antrum\\w*|(t|T)ears(?!\\w)|(t|T)eas\\w*|(t|T)ehe(?!\\w)|(t|T)emper(?!\\w)|(t|T)empers(?!\\w)|(t|T)ender\\w*|(t|T)ense\\w*|(t|T)ensing(?!\\w)|(t|T)ension\\w*|(t|T)erribl\\w*|(t|T)errific\\w*|(t|T)errified(?!\\w)|(t|T)errifies(?!\\w)|(t|T)errify (?!\\w)|(t|T)errifying(?!\\w)|(t|T)error\\w*|(t|T)hank(?!\\w)|(t|T)hanked(?!\\w)|(t|T)hankf\\w*|(t|T)hanks(?!\\w)|(t|T)hief(?!\\w)|(t|T)hieve\\w*|(t|T)houghtful\\w*|(t|T)hreat\\w*|(t|T)hrill\\w*|(t|T)icked(?!\\w)|(t|T)imid\\w*|(t|T)oleran\\w*|(t|T)ortur\\w*|(t|T)ough\\w*|(t|T)raged\\w*|(t|T)ragic\\w* |(t|T)ranquil\\w*|(t|T)rauma\\w*|(t|T)reasur\\w*|(t|T)reat(?!\\w)|(t|T)rembl\\w*|(t|T)rick\\w*|(t|T)rite(?!\\w)|(t|T)riumph\\w*|(t|T)rivi\\w*|(t|T)roubl\\w*|(t|T)rue (?!\\w)|(t|T)rueness(?!\\w)|(t|T)ruer(?!\\w)|(t|T)ruest(?!\\w)|(t|T)ruly(?!\\w)|(t|T)rust\\w*|(t|T)ruth\\w*|(t|T)urmoil(?!\\w)|(u|U)gh(?!\\w)|(u|U)gl\\w*|(u|U)nattractive(?!\\w)|(u|U)ncertain\\w*|(u|U)ncomfortabl\\w*|(u|U)ncontrol\\w*|(u|U)neas\\w*|(u|U)nfortunate\\w*|(u|U)nfriendly(?!\\w)|(u|U)ngrateful\\w*|(u|U)nhapp\\w*|(u|U)nimportant(?!\\w)|(u|U)nimpress\\w*|(u|U)nkind(?!\\w)|(u|U)nlov\\w*|(u|U)npleasant(?!\\w)|(u|U)nprotected(?!\\w)|(u|U)nsavo\\w*|(u|U)nsuccessful\\w*|(u|U)nsure\\w*|(u|U)nwelcom\\w*|(u|U)pset\\w*|(u|U)ptight\\w*|(u|U)seful\\w*|(u|U)seless\\w* |(v|V)ain(?!\\w)|(v|V)aluabl\\w*|(v|V)alue(?!\\w)|(v|V)alued(?!\\w)|(v|V)alues(?!\\w)|(v|V)aluing(?!\\w)|(v|V)anity(?!\\w)|(v|V)icious\\w*|(v|V)ictim\\w*|(v|V)igor\\w*|(v|V)igour\\w*|(v|V)ile(?!\\w)|(v|V)illain\\w*|(v|V)iolat\\w*|(v|V)iolent\\w*|(v|V)irtue\\w*|(v|V)irtuo\\w*|(v|V)ital\\w*|(v|V)ulnerab\\w*|(v|V)ulture\\w*|(w|W)ar(?!\\w)|(w|W)arfare\\w*|(w|W)arm\\w*|(w|W)arred(?!\\w)|(w|W)arring(?!\\w)|(w|W)ars(?!\\w)|(w|W)eak\\w*|(w|W)ealth\\w*|(w|W)eapon\\w*|(w|W)eep\\w*|(w|W)eird\\w*|(w|W)elcom\\w*|(w|W)ell\\w*|(w|W)ept(?!\\w)|(w|W)hine\\w*|(w|W)hining(?!\\w)|(w|W)hore\\w*|(w|W)icked\\w*|(w|W)illing(?!\\w)|(w|W)imp\\w*|(w|W)in(?!\\w)|(w|W)inn\\w*|(w|W)ins(?!\\w)|(w|W)isdom(?!\\w)|(w|W)ise\\w*|(w|W)itch(?!\\w)|(w|W)oe\\w*|(w|W)on(?!\\w)|(w|W)onderf\\w*|(w|W)orr\\w*|(w|W)orse\\w*|(w|W)orship\\w*|(w|W)orst(?!\\w)|(w|W)orthless\\w* |(w|W)orthwhile(?!\\w)|(w|W)ow\\w*|(w|W)rong\\w*|(y|Y)ay(?!\\w)|(y|Y)ays(?!\\w)|(y|Y)earn\\w*)"
Let’s assign this to an object called regex_affect
:
<- str_c("\\b(",str_flatten(dict_affect$regex,collapse="|"), ")") regex_affect
Now, let’s apply this to the fifth video, whose speaker is the same one from the Brady et al. paper:
str_count(teded$Caption[5], regex_affect)
[1] 22
What are those 22 words?
str_extract_all(teded$Caption[5], regex_affect)
[[1]]
[1] "War" "war" "terrible" "argument" "fight"
[6] "feud" "disagreement" "loyal" "discouraged" "serious"
[11] "hostility" "personal" "free" "importantly" "free"
[16] "create" "interests" "benefit" "shared" "serious"
[21] "positive" "easily"
We will explore how to use the quanteda
package, which is faster, in future lessons.
Exercise
- How many transcripts have more than 2,000 words?
- Create a new variable called
video_id
by extracting the strings after=
in the YouTube links.