Pandas study - pandas 튜토리얼(7)

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Ethan's Values

Pandas study - pandas 튜토리얼(7) 본문

Python

Pandas study - pandas 튜토리얼(7)

Ethan_hyk 2023. 10. 19. 18:07

텍스트 데이터 다루기

str접근자와 그에 해당하는 함수로 사용한다.

- 모든 이름을 소문자로 만들기

In [4]: titanic["Name"].str.lower()
Out[4]: 
0                                braund, mr. owen harris
1      cumings, mrs. john bradley (florence briggs th...
2                                 heikkinen, miss. laina
3           futrelle, mrs. jacques heath (lily may peel)
4                               allen, mr. william henry
                             ...                        
886                                montvila, rev. juozas
887                         graham, miss. margaret edith
888             johnston, miss. catherine helen "carrie"
889                                behr, mr. karl howell
890                                  dooley, mr. patrick
Name: Name, Length: 891, dtype: object

split 함수로 문자를 분리하면 각 결과값이 List로 반환되며 구분자 별로 데이터가 잡힘

In [5]: titanic["Name"].str.split(",")
Out[5]: 
0                             [Braund,  Mr. Owen Harris]
1      [Cumings,  Mrs. John Bradley (Florence Briggs ...
2                              [Heikkinen,  Miss. Laina]
3        [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]
4                            [Allen,  Mr. William Henry]
                             ...                        
886                             [Montvila,  Rev. Juozas]
887                      [Graham,  Miss. Margaret Edith]
888          [Johnston,  Miss. Catherine Helen "Carrie"]
889                             [Behr,  Mr. Karl Howell]
890                               [Dooley,  Mr. Patrick]
Name: Name, Length: 891, dtype: object

split함수로 문자를 나눈 후 첫번째에 있는 값만 가져 오려면 str 접근자에 또 한번 str 접근자를 사용할 수 있다.

In [6]: titanic["Surname"] = titanic["Name"].str.split(",").str.get(0)

In [7]: titanic["Surname"]
Out[7]: 
0         Braund
1        Cumings
2      Heikkinen
3       Futrelle
4          Allen
         ...    
886     Montvila
887       Graham
888     Johnston
889         Behr
890       Dooley
Name: Surname, Length: 891, dtype: object

데이터 포함 여부를 위한 함수 - contains()

In [8]: titanic["Name"].str.contains("Countess")
Out[8]: 
0      False
1      False
2      False
3      False
4      False
       ...  
886    False
887    False
888    False
889    False
890    False
Name: Name, Length: 891, dtype: bool




In [9]: titanic[titanic["Name"].str.contains("Countess")]
Out[9]: 
     PassengerId  Survived  Pclass  ... Cabin Embarked  Surname
759          760         1       1  ...   B77        S   Rothes

[1 rows x 13 columns]

가장 이름이(문자열) 긴 데이터 찾기

- 먼저 데이터마다의 길이를 len 함수를 이용하여 가져온다.

In [10]: titanic["Name"].str.len()
Out[10]: 
0      23
1      51
2      22
3      44
4      24
       ..
886    21
887    28
888    40
889    21
890    19
Name: Name, Length: 891, dtype: int64

- idxmax()를 사용하여 숫자가 가장 큰 인덱스 값을 반환받는다.

In [11]: titanic["Name"].str.len().idxmax()
Out[11]: 307

- loc 을 사용하여 해당 인덱스 값에 매치되는 이름을 찾는다

In [12]: titanic.loc[titanic["Name"].str.len().idxmax(), "Name"]
Out[12]: 'Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)'

문자열 수정하기(바꾸기)

replace.{from : to}

In [13]: titanic["Sex_short"] = titanic["Sex"].replace({"male": "M", "female": "F"})

In [14]: titanic["Sex_short"]
Out[14]: 
0      M
1      F
2      F
3      F
4      M
      ..
886    M
887    F
888    F
889    M
890    M
Name: Sex_short, Length: 891, dtype: object

'Python' 카테고리의 다른 글

파이썬 데이터 주무르기 2장 서울시 범죄 현황 분석 (0)	2023.10.25
파이썬 데이터 주무르기 1장 서울시 구별 CCTV 현황 분석 (3)	2023.10.23
Pandas study - pandas 튜토리얼(6) (2)	2023.10.19
Pandas study - pandas 튜토리얼(5) (0)	2023.10.19
Pandas study - pandas 튜토리얼(4) (0)	2023.10.19

'Python' Related Articles

Comments

Ethan's Values

Pandas study - pandas 튜토리얼(7) 본문

Pandas study - pandas 튜토리얼(7)

텍스트 데이터 다루기

데이터 포함 여부를 위한 함수 - contains()

가장 이름이(문자열) 긴 데이터 찾기

문자열 수정하기(바꾸기)

'Python' 카테고리의 다른 글

티스토리툴바