'MongoDB/MongoDB-Study_완료' 카테고리의 글 목록

MongoDB/MongoDB-Study_완료

[MongoDB] [Study-12] Authentication & Role 정리 2021.04.16
[MongoDB] [Study11] 모델링 2021.04.16
[MongoDB] [Study-10] Lock & Transactions 2021.04.16
[MongoDB] [Study-9] Index 2021.04.16
[MongoDB][Study-8] Aggregation 2021.04.16
[MongoDB][Study-7] Find / FindAndModify / Cursor 2021.03.28
[MongoDB] [Study-6] MongoDB CRUD 쓰기 연산 2021.03.28
[MongoDB] [Study-Break] Cursor 간략한 정리 2021.03.27
[MongoDB][Study-4] MongoDB 기본 명령어 익히기 2021.03.27
[MongoDB][Study-3] Sharding 2021.03.27

[MongoDB] [Study-12] Authentication & Role 정리

2021. 4. 16. 01:22

MongoDB 보안

MongoDB 는 보안 관점으로 크게 5가지로 정리를 할 수 있으며, 해당 문서에서는 인증과 권한, 테스트한 내역만 작성
인증(Authentication)
권한(Authorization)
암호화(Encryption) - TDE - Mongo DB Enterprise 에서 제공
감사 (Auditing) - MongoDB Enterprise 및 Percona 제공
데이터 관리 (Data Governance) - 데이터의 일관성을 유지라 해당 서적에서는 언급 안함

인증(Authentication)

내부 인증

MongoDB와 MongoDB 라우터 서버 간(레플리카 셋에서 각 멤버간)의 통신을 위해서 사용되는 인증
내부 인증은 키 파일과 x.509인증서 2가지 방식을 선택 가능

인증 활성화 하기 위해서 설정파일에서 인증과 관련된 옵션 활성화 해야 함

$ /etc/mongod.conf

security : 
  authorization: enabled   # 내부 및 사용자 인증까지 모두 활성화(내부/사용자 인증을 개별로 설정 못함) 

#key 파일 생성은 기존 문서로 대체 합니다. 
#기존 key 파일 생성 및 auth 내용
  clusterAuthMode : keyFile
  keyFile : /etc/mongod.key

keyFile은 평문의 단순 문자열로 구성된 비밀번호 파일을 MongoDB 서버(OS)가 내부 인증으로 사용하도록 하는 방식
keyFile 생성할 때 주의점

해당 파일은 클러스터에 참여하는 모든 MongoDB가 공유해야함-동일한 파일을 모든 멤버의 서버에 복사해서 사용해야 함

keyFile 은 MongoDB 서버 프로세스가 읽을 수 있어야 함
keyFile의 접근권한은 반드시 600, 700으로 파일의 소유주만 접근할 수 있어야 함
keyFile의 내용에서 공백문자는 자동으로 무시
keyFile은 6개 이상 1024개 이하의 문자로 구성돼야 하며, Base-64셋에 포함되는 문자만 사용할 수 있음

사용자 인증

MongoDB서버 외부의 응용 프로그램이 MongoDB 클라이언트 드라이버 이용해서 접속 시도할 때
사용자를 생성할 때 반드시 특정 데이터베이스로 이동해서 생성해야 함 > 인증 데이터베이스라고 함(Authentication Database)

여러 DB에 대해 권한을 가질 수 있지만, 인증 데이터베이스(로그인을 위한)는 하나만 가질 수 있음 (admin 에서 생성 하고, test 에서 또 권한을 주더라도 test를 인증하는 데이터베이스로 할수 없고 로그인할 때는 무조건 admin으로 먼저 접속)
활성화기 위해서 /etc/mongod.conf 파일을 수정. 클러스터 멤버 간 통신의 인증을 위해서 clusterAuthMode 및 keyFile 옵션을 추가 더 사용해야 함
db.system.users.find().pretty() 명령 으로 유저 정보를 확인 가능
동일한 계정명과 패스워드를 하더라도 다음과 같이 생성하면 두 계정은 mongoDB에서는 서로 다른 계정으로 인식함
만약 하나의 사용자 계정이 여러 데이터베이스 대해서 권한을 가지도록 한다면 다음과 같이 해야 함

단순 사용자 인증을 위한 설정

security:
   authorization : enabled

use mysns
db.createUser({user:"user",pwd:"password", roles:["readWrite" ] })

use myblog
db.createUser({user:"user",pwd:"password", roles:["readWrite" ] })

#유저 생성 여부 확인
show users

use mysns
db.createUser({user:"user",pwd:"password", roles:[ "readWrite", {role:"readWrite", db:"myblog" }  ] })

또는
use mysns
db.createUser({user:"user",pwd:"password", roles:["readWrite" ] })
db.grantRolesToUser("user", [{ role:"readWrite", db:"myblog" }])

생성된 유저 확인
db.system.users.find().pretty()

외부 인증 방식

LDAP나 Active Directory 를 이용해서 사용자 인증을 의미 하며 Enterprise 버전에서만 사용 가능
LDAP 관련한 설명 블로그 참조 ( https://jabcholove.tistory.com/89 )
Percona Server for MongoDB SASL (percona-server-for-mongodb-authentication-using-active-directory 설명 - https://www.percona.com/blog/2018/12/21/percona-server-for-mongodb-authentication-using-active-directory/ )

This article will walk you through using the SASL library to allow your Percona Server for MongoDB instance to authenticate with your company’s Active Directory server. Percona Server for MongoDB includes enterprise level features, such as LDAP authentication, audit logging and with the 3.6.8 release a beta version of data encryption at rest, all in its open source offering.

권한 (Authorization)

액션

명령이 처리되는 동안 발생하는 각 단위 작업을 나누어서 MongoDB의 명령들이 하나 이상의 단위 액션들의 집합으로 처리되는 개념
버전에 따라 미리 정의해 둔 액션의 종류는 매우 다양하고 개수도 많으며, 추가/제거되는 명령도 많기 때문에 MongoDB에서 직접 체크가 필요
최소 단위의 권한으로 일반적인 명령어를 실행하기 위해서는 여러 액션의 권한이 필요 (aggregate라는 명령어를 실행하기 위해서는 find / insert / bypassDocumentValidation이라는 3가지 액션이 필요)

내장된 역할(Role)

MongoDB에서 내부적으로 default로 만든 role로 여러 액션들의 집합체
ex) read 라는 내장된 롤에는 collStats, dbHash, dbStats, find, killCursors, listIndexes, listCollections 의 액션으로 만들어짐
mysns 데이터베이스에 대해서 readWrite 역할을 가지고 myblog 데이터베이스에 대해서 read 역할만 가지는 사용자 계정 생성

mongodb > use mysns
Mongodb > db.createUser({user:"mysns_user",pwd:"mypassword",roles:["readWrite",{role:"read",db:"myblog"}]})

cf ) User 생성 시 인증 DB를 admin 으로 하되, 권한은 DB 단위의 권한만 부여 되는지 여부

생성 시 아래와 같은 방법으로 생성 추가 가능
로그인 시 admin 으로 로그인만 가능. 그 외 부여된 권한에 대해서만 진행 가능

mongodb > use admin
mongodb > db.createUser({user:"testadmin",pwd:"testadmin!23",roles:[{ role:"readWrite", db:"test" }]})

또는 

mongodb > db.createUser({user:"testadmin",pwd:"testadmin!23",roles:[]})
mongodb > db.grantRolesToUser("testadmin", [{ role:"readWrite", db:"test" }])

dbowner : DB 관리조치를 수행할 수 있으며, readWrite, dbAdmin, userAdmin 역할이 부여한 권한을 결합
-> dbOwner를 admin database에 부여하면 슈퍼유저를 가질 수 있음 (userAdmin동일)
dbadmin : 스키마 관련 작업 등을 할 수 있지만 권한 부여는 못함
userAdmin : 현재 데이터베이스에서 역할 및 사용자를 작성 수정 기능. 사용자에게 권한 부여 , admin에서 userAdmin을 부여하면 슈 퍼유저 액세스 가능(db/cluster)
-> userAdminAnyDatabase를 포함하여 클러스터 전체 역할 또는 권한 부여 가능

결론 : dbOwner ⊇ userAdmin ≠ dbadmin Superuser >

Superuser > root : readWriteAnyDatabase, dbAdminAnyDatabase, userAdminAnyDatabase, clusterAdmin, restore, and backup combined.

Role type	Role	Role 설명	제외되는 role	상세 내역	확인 사항
Database User Roles	read	읽기	system collection 제외	changeStream, collStats, dbHash, dbStats, find, killCursors, listIndexes, listCollections 등과 같은 명령어를 처리
Database User Roles	readWrite	읽기 + 쓰기	system collection 제외	read에 해당하는 명령어에 convertToCapped, createCollection, dropCollection, createIndex, dropIndex, emptycapped, insert, remove, renameCollectionSameDB, update 와 같은 명령어를 처리
Database Administration Roles	dbAdmin	indexing, gathering, statistics 등 역할을 할 수 있으며	user와 role에 대한 management는 제외	collStats, dbHash, dbStats, find, killCursors, lisstIndexes, listCollections 및 dropCollection, createCollection on system.profile only	기술PM : dbAdmin + readWrite 권한 부여
	dbOwner	readWrite + dbAdmin + userAdmin
	userAdmin	해당 DB에 대한 user의 roles를 생성하거나 변경을 수행하며, admin에 대한 userAdmin 권한을 받았을 경우 superuser를 생성 가능	WARNING It is important to understand the security implications of granting the userAdmin role: a user with this role for a database can assign themselves any privilege on that database. Granting the userAdmin role on the admin database has further security implications as this indirectly provides superuser access to a cluster. With admin scope a user with the userAdmin role can grant cluster-wide roles or privileges including userAdminAnyDatabase.	changeCustomData changePassword createRole createUser dropRole dropUser grantRole revokeRole setAuthenticationRestriction viewRole viewUser
Cluster Administration Roles	clusterAdmin	클러스터에 대한 최대 권한 부여자 clusterManager + clusterMonitor + hostManager + dropDatabase 권한(action)			dbOwner 가 cluster권한 포함 ㅇㅕ부(포함한다면 ㅎㅐ당 권한 사용 안함) dbadmin > clusterManager 이다면 dbadmin만 pm에게 권한 부여
	clusterManager	config, local database에 접속할 수 있는 권한 cluster action 에 대한 management와 monitoring을 제공		Cluster 에 대한 권한 appendOplogNotecleanupOrphanedlistSessions (New in version 3.6)removeShardreplSetGetConfigreplSetStateChange resync replSetGetStatus replSetConfigure listShards flushRouterConfig applicationMessage addShard All databases in the cluster 에 대해서 다음과 같은 권한 부여moveChunksplitVector splitChunk enableSharding
	clusterMonitor	Monitor에만 국한되며 read only 의 access 보유		Cluster에 대한 권한connPoolStatsgetLoggetShardMapinproglistSessions (New in version 3.6)netstatreplSetGetStatussetFreeMonitoring (New in version 4.0)top shardingState serverStatus replSetGetConfig listShards listDatabases hostInfo getParameter getCmdLineOpts checkFreeMonitoringStatus (New in version 4.0) all databases in the cluster :dbStatsindexStats useUUID (New in version 3.6) getShardVersion collStats
	hostManager	각각의 서버에 대한 monitor와 manage 역할을 수행		Cluster 권한closeAllDatabasescpuProfilerfsynckillAnyCursor (New in version 4.0)killopresyncshutdownunlock touch setParameter logRotate killAnySession (New in version 3.6) invalidateUserCache flushRouterConfig connPoolSync applicationMessage all databases in the cluster killCursors
Backup and Restoration Roles	backup	mongodump 등으로 backup 수행
Backup and Restoration Roles	restore	mongorestore 등으로 restore 수행	system.profile collection data는 제외
All-Database Roles	readAnyDatabase
	readWriteAnyDatabase
	userAdminAnyDatabase
	dbAdminAnyDatabase	dbAdmin 과 동일한 권한(local and config 제외)
Superuser Roles	root	superuser 권한	restore 제외라고 어떤 블로그에 공유되었지만, 공식 문서에는 restore 권한도 포함 되어 있음 (The root role includes privileges from the restore role.)	dbOwner 권한을 admin 에 지정한 경우 userAdmin 권한을 admin 에 지정한 경우 userAdminAnyDatabase 권한
Internal Role	__system	어떠한 database any object에 any action을 취할 수 있으나, 일반적으로 사람들에게 지정되는 것이 아니라 application이나 human administrators 에게 부여되는 권한이다.

사용자 정의 role

사용자가 자신의 서비스나 요건에 맞게 새로운 역할을 정의해서 사용할 수 있도록 가능
2가지 범위를 가짐 > 사용자 정의 역할을 어느 데이터베이스에서 생성했느냐에 따라 구분
전역 역할

admin DB에서 생성하면 전역 역할

데이터베이스 단위 역할

admin DB이외에서 생성하면 해당 DB 단위로만 역할을 부여
db.createRole() 로 생성 db.grantPrivilegesToRole() / db.revokePrivilgesFromRole() 로 제거
내장된 역할 또는 이미 정의된 역할을 부여하는 방법은 db.grantRolesToRole() 과 db.revokeRolesFromRole() 로 추가 제거가 가능

MongoDB > use admin
MongoDB > db.createRole({
   role: "dev_mysns".
   privileges: [
     { resource: { db: "mysns", collection: ""}, actions: ["find", "update", "insert", "remove"]},
     { resource : {db: "myblog", collection: ""}, actions: ["find"]}
   ], roles: []
})

MongoDB > use admin
MongoDB > db.createRole({
   role: "dev_mysns",
   privileges: [],
   roles: [
      {role : "readWrite", db:"mysns"}
      ,{role : "read", db:"myblog"}
   ]
})

정의된 롤 확인

MongoDB > use admin
MongoDB > db.system.roles.find().pretty()

유저 생성 및 사용자 정의 롤 부여

MongoDB > use admin
MongoDB > db.createUser({user : "user", pwd: "mypassword", roles:["dev_mysns"] })

테스트 내역

유저 생성 및 접속 체크

# gadmin root admin (DB : admin) 

MongoDB Enterprise > use admin 
switched to db admin 

MongoDB Enterprise > db.createUser({user:"gadmin",pwd:"gadmin",roles:["root"]}) 
Successfully added user: { "user" : "gadmin", "roles" : [ "root" ] } 

# test db book collection document  
MongoDB Enterprise > use test 
switched to db test 

MongoDB Enterprise > db.book.insert({"name":"mongodb", "author":"hyungi"}) 
WriteResult({ "nInserted" : 1 }) 

MongoDB Enterprise > show dbs 
test 0.000GB 

MongoDB Enterprise > db.book.find() 
{ "_id" : ObjectId("5ddb78f35f326d0194f85cbc"), "name" : "mongodb", "author" : "hyungi" } 

# test DB lgtest readWrite  
MongoDB Enterprise > use test; 
MongoDB Enterprise > db.createUser({user:"lgtest",pwd:"lgtest!23",roles:["readWrite"]})

# test2 DB lgtest2 readWrite
MongoDB Enterprise > use test2
MongoDB Enterprise > db.book.insert({"name":"python", "author":"aca"})
MongoDB Enterprise > db.createUser({user:"lgtest2",pwd:"lgtest2!23", roles:["readWrite",{role:"readWrite",db:" test"}]})

# user
MongoDB Enterprise > show users

인증 DB는 test2 로 생성하였기 때문에, test DB로 로그인 인증 시도 시 실패 확인

기존 인증 DB인 test2 로 접근은 가능 확인

로그인 후 로그인 한 dbs 확인 시 test 로 확인이 되어, 이 부분에 대해서는 체크가 필요 ( db.system.getDB() ) 로 확인 가능

admin 으로 다시 로그인 후 확인했더니 test 로 접속

Role Test

MongoDB Enterprise > db.createRole({
  role: "role_test",
  privileges: [
    { resource: { db: "test", collection: ""}, actions: ["find", "update", "insert", "remove"]},
    { resource : {db: "test2", collection: ""}, actions: ["find"]}
  ], roles: []
})
MongoDB Enterprise > db.system.roles.find().pretty()

다른 DB에서는 role 부여가 안되고 admin 에서만 부여가 되는 것을 확인

MongoDB > use admin
MongoDB > db.createUser({user : "lgtest3", pwd: "lgtest3!23", roles:["role_test"] })
MongoDB > db.book.insert({"name":"mongodb2","author":"bkkim"})
MongoDB > db.book.insert({"name":"mysql2","author":"bkkim"})

인증 DB는 admin 하지만 권한들은 test(read/write),test2(read)

당연히 인증은 admin 만 되고 막상 접속하면 db는 test 로 접속되어 있음

부여한 롤은 시스템 정보를 모르고 있기에 알고 있는 정보로만 find 해야함...

collection / dbs 정보 들을 확인할 수 없음

role 은 정상 동작

userAdmin role

userAdmin 은 readWrite 등의 dml 권한은 없지만, 유저 권한 부여 등은 가능
해당 부분에 대한 테스트

MongoDB Enterprise > use test
MongoDB Enterprise > db.createUser({user:"useradmin",pwd:"useradmin!23",roles:["userAdmin"]})
$> mongo -u "useradmin" -p --authenticationDatabase "test"

Test 내역

show dbs
show collections
db.book.find()
db.book.insert({"name":"useradmin","author":"roles"})
db.createUser({user:"roletest",pwd:"roletest!23", roles:["read"]})

dbs / collections 확인 불가

find / insert 진행 불가

유저 생성 권한 부여 가능

dbAdmin Role

dbAdmin 은 인덱싱 등 통계 수집같은 관리 작업을 수행
참고로 아래 테스트는 userAdmin 권한을 가진 useradmin 계정으로 권한을 부여 (userAdmin은 dbAdmin 권한을 부여 가능)

MongoDB Enterprise > use test
MongoDB Enterprise > db.createUser({user:"dbadmin",pwd:"dbadmin!23",roles:["dbAdmin"]})
$> mongo -u "useradmin" -p --authenticationDatabase "test"

Test 내역

show dbs
show collections
db.book.find()
db.book.insert({"name":"useradmin","author":"roles"})
db.book.createIndex({name:1})
db.createUser({user:"adminroletest",pwd:"adminroletest!23", roles:["read"]})

userAdmin을 가진 계정이 dbAdmin 의 롤을 가진 유저 생성 가능

userAdmin 은 계정 생성 및 user 생성 되었는지 확인 가능

db 조회는 안되지만, collection 조회는 가능

find 및 insert 불가

index 생성 가능

유저 생성은 불가

관리의 관점에서 진행하기 위해 dbAdmin을 생성하여 사용

dbOwner role

dbOwner 은 readWrite, dbAdmin, userAdmin role 을 보유하며, DB 관리 조치를 수행 가능
참고로 아래 테스트는 userAdmin 권한을 가진 useradmin 계정으로 권한을 부여 (userAdmin은 dbOwner 권한을 부여 가능)
더불어 test2 DB 에서는 계정 추가가 안되는 것을 확인(참고로 useradmin는 test에 대해서만 userAdmin 권한을 보유)

MongoDB Enterprise > use test
MongoDB Enterprise > db.createUser({user:"dbowner",pwd:"dbowner!23",roles:["dbOwner"]})
$> mongo -u "dbowner" -p --authenticationDatabase "test"

Test 내역

show dbs
show collections
db.book.find()
db.book.insert({"name":"dbowner","author":"one of the role"})
db.book.createIndex({author:-1})
db.createUser({user:"ownerroletest",pwd:"ownerroletest!23", roles:["read"]})

userAdmin을 가진 계정이 dbOwner 의 롤을 가진 유저 생성 가능

userAdmin 롤을 가진 계정은, 계정 생성 및 user 생성 되었는지 확인 가능
userAdmin 는 자신이 할당 받은 DB에 대해서만 권한 부여가 가능

db 조회는 되지 않지만 collection 조회는 가능

collection 내 Document 조회도 가능하며, Insert 도 가능

Index 생성 가능하며, 다른 유저 또한 생성 가능

결론

인증 DB로 로그인 가능
userAdmin의 권한은 함부로 부여 해서는 안됨

자기 자신이 무언가를 하기는 힘들지만, 자신이 가진 권한으로 상위 권한을 부여 가능(dbOwner라는 권한을 부여하여 자신의 DB 컨트롤 가능)

dbAdmin 권한은 Insert 권한이 없는 그저 관리의 목적으로 사용될 수 있지만, drop권한이 있어서 잘 숙지하고 권한 부여 해야 함
dbOwner 권한은 해당 DB의 최 상위 권한이지만, userAdmin과 함께 Admin DB에만 부여 안한다면 QA DB셋팅 시 개발사에게 권한 위임의 용도로 유용

운영 DB에서는 readWrite 권한이면 충분

가장 중요한 것은 DB Schema 별로 최대 dbOwner까지만 부여하고 Admin DB에는 절대 부여하면 안됨

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB] [Study11] 모델링 (0)	2021.04.16
[MongoDB] [Study-10] Lock & Transactions (0)	2021.04.16
[MongoDB] [Study-9] Index (0)	2021.04.16
[MongoDB][Study-8] Aggregation (0)	2021.04.16
[MongoDB][Study-7] Find / FindAndModify / Cursor (0)	2021.03.28

Document 관계 유형

1:N 관계

예시) 게시판과 게시글 관계
임베디드 방식으로 관계를 표현 가능

샘플
{
    _id:ObjectId()
    , 이름 : 자유게시판
    , 게시글 : [ { no : 1
        , 제목 : 첫번째글
        , 내용 : 편하네요
        } ,
        { no : 2
        , 제목 : 두번째글
        , 내용 : 편하네요2
        }
    ]
}

하지만 하나의 Document 크기가 커지면 이 때 2개의 Collection으로 1:N 관계로 변경을 고려.

N:N 관계

예시) 게시글과 사용자 관계
조회를 해야 하는 경우 별도의 조회용 Collection 을 생성하여 조회를 해야하는 정보들을 미리 저장하고 있다가, 해당 Collection 조회 결과 값을 이용하여 찾는 방식
이렇게 하면 별도의 조회용 Collection과 게시글 또는 사용자와 관계가 1:N 관계로 엮임
또는 게시글 Document에 조회 field를 생성하여 배열로 구성 가능

1:1 관계

예시)사용자와 포인트 관계
이럴 때에는 사용자 Document에 포인트 정보를 포함하면 간단. (조인을 줄이는게 목적)
하지만 하나의 Document 크기가 커지면 이 때 2개의 Collection으로 1:1 관계로 변경을 고려.

MongoDB 모델링

유연한 스키마

Insert 하기 전 Schema를 정의할 필요 없음
문서를 Entity 또는 Field에 쉽게 매핑 가능
Collection 에 유효성 검사 적용 가능 (schema-validation 참고(https://docs.mongodb.com/manual/core/schema-validation/) )

Atomicity of Write(원자성) 작업

하나의 Document에 대해서는 원자성을 보장
updateMany 의 경우 처럼 여러 (multiple) document에 수정을 할 때 하나씩(single) document에 대해서는 원자성을 보장하지만, 전체에 대해서는 원자성을 보장하지 않음 (하나하나씩 진행 하기 때문)
Multiple 에 대해 원자성을 보장받기 위해서는 Transaction 을 이용하여 진행이 필요

Transaction 을 사용 시 성능이 떨어지기 때문에 가급적 Transaction 사용을 최소화가 필요

MongoDB 모델링 시 고려사항 정리

Data와 Business 중심의 설계

Application의 쿼리 중심 설계를 의미
비즈니스 요구사항에 맞춰서 비정규화(Embedded), 데이터 중복 허용

Document 관계 데이터 저장 유형

Embedded vsReferences

자식 객체가 자신의 부모 객체 외부에 절대 나타나지 않는 경우라면 포함시키고, 그렇지 않다면 별도의 collection 을 만들어 저장.

조인이 필요한 경우라면 포함 시키고, 필요 없다면 collection 으로 개발을 추천하는 의미로 해석

Embedded

16Mb 제한
빈번한 업데이트, 크기가 증가하는 업데이트일 경우 권장하지 않음 (단편화 발생)
읽기 속도 향상 : 한번의 쿼리로 원하는 데이터 추출 가능

References

복잡하지만 유연한 데이터 구조
데이터 크기 제한 없음
상대적으로 강한 일관성 제공 가능
해당 Document만 삭제,수정,추가만 하면 되기 때문

Embedded

Document 내에 존재
1개의 Document 데이터를 다른 Document key 의 value 에 저장하는 방법

// Person
{
   _id: "joe",
   name: "Joe Bookreader",
   address: {
      street: "123 Fake Street",
      city: "Faketon",
      state: "MA",
      zip: "12345"
  }
}


// Address
{
   pataron_id: "joe",
   street: "123 Fake Street",
   city: "Faketon",
   state: "MA",
   zip: "12345"
}

Database References

하나의 Document 내에 저장되는 비 정규화된 데이터 모델에 최적 (경우에 따라 별도의 Document에 저장이 바람직한 경우도 존재)
Pointer 개념으로 이해하면 쉬움 (Embedded 의 경우 Document를 통채로 저장하는 반면, Reference의 경우 ID 를 저장하는 것)
하나의 Document 내에 embeded 형태 보다 더 유연한 방식
3.2 에서는 $lookup 파이프라인을 이용하여 동일 DB 내 샤드 되지 않은 Collection에 left outer join 가능
3.4 이후부터 $graphLookup 파이프라인을 이용하여 샤드 되지 않은 Collection에 재귀 검색을 수행 가능 (자신의 collection의 의미로 해석)
2가지 방법을 이용하여 Document 참조 가능

Manual 참조

참조할 Document의 _id 필드를 다른 Collection 내 Document에 하나의 key(필드)로 참조 저장

db.places.insert({
    name: "Broadway Center",
    url: "bc.example.net"
})
db.people.insert({
    name: "Erin",
    places_id: db.places.findOne({name: "Broadway Center"})._id,
    url:  "bc.example.net/Erin"
})

> var peopleDoc = db.people.findOne({name: "Erin"});
> var placeID = peopleDoc.places_id;
> var placesDoc = db.places.findOne({_id: placeID});
> placesDoc.url
bc.example.net

# 또는

> db.places.findOne({ _id: db.people.findOne({name: "Erin"}).places_id }).url
bc.example.net

DBRefs

참조할 Document의 "_id" 필드의 값과 옵션으로서의 DB 이름을 이용하여 어느 하나의 Document가 다른 Document를 참조하는 것
여러 Collection에 위치한 Document를 단일 Collection Document에서 쉽게 연결하여 조회 가능
DBRef는 3개의 인자를 가지는데, 처음 두 개는 필수 인자이며($ref, $id), 세 번째 인자는 옵션 인자($db)
$ref

참조할 Document가 존재하는 Collection 이름

참조된 Document 내 _id 필드 값

참조할 Document가 존재하는 DB 이름

출처 : 맛있는MongoDB

MongoDB 인 액션

https://blog.voidmainvoid.net/241

NoSQL강의) mongoDB에서 data 모델링하는 방법. 예제포함.

MongoDB 주요 특징 Secondary Index ▪ 다른 NOSQL 보다 secondary index 기능이 발달되어 있음 샤드키 지정 ▪ _id : 키 필드 ▪ Shard Key <> _id - 대부분의 NOSQL은 Row Key = Shard Key 임 Document 기..

blog.voidmainvoid.net

https://cinema4dr12.tistory.com/375

[MongoDB] Database References

by Geol Choi | March 10, 2014 이번 글에서는 "데이터베이스 간 참조"에 대하여 알아보도록 하겠다. 도큐먼트를 참조하는 방법은 크게 두 가지가 있는데, 하나는 수동 참조(Manual Reference)이며 다른 하나

cinema4dr12.tistory.com

https://devhaks.github.io/2019/11/30/mongodb-model-relationships/

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB] [Study-12] Authentication & Role 정리 (0)	2021.04.16
[MongoDB] [Study-10] Lock & Transactions (0)	2021.04.16
[MongoDB] [Study-9] Index (0)	2021.04.16
[MongoDB][Study-8] Aggregation (0)	2021.04.16
[MongoDB][Study-7] Find / FindAndModify / Cursor (0)	2021.03.28

[MongoDB] [Study-10] Lock & Transactions

2021. 4. 16. 00:46

Lock 매커니즘과 Transaction

4.0 이하에는 지원하지 않음
Two Phase Commit

상태 값을 부여하여 진행하고 해당 로그를 다른곳에 저장하여 상태 검증할 수 있게 함.
상태 검증해서 완벽하게 끝났다면, 상태 로그값을 삭제.
결론은 계속 양쪽으로 상태를 체크해야함.

MongoDB에서 명시적으로 Lock은 Global Lock(Instance를 Lock) 만 가능(3.4 에서는 Global Lock외에는 모두 묵시적 Lock만 지원) > fsyncLock 명령어 이용

Global Lock을 사용하면 읽기는 막지 않음 / 쓰기는 Blocking 때문에 이후의 데이터 읽기나 변경이 모두 멈춤(주의)
fsyncLock을 사용한 connection에 대해서는 닫지 말고 유지 하라고 권고(다른 connection에서 unlock명령어가 실행 안될 가능성이 존재

Intention Lock(인덴션 락) 은 IS(Intent Shared Lock) + IX (Intent Exclusive Lock) 의도된 잠금을 묶어서 하는 말

Intention Lock의 경우 Collection - DB - Global까지 가능 / Document의 경우 불가( 묵시적 Exclusive lock만 가능)

General Transaction

WiredTiger Storage Engine 을 기반으로 작성

최고 레벨의 격리 수준은 Snapshot (=Repeatable-read) / Serializable 격리 수준은 제공하지 않음

Transaction Log(General Log, Redo Log), CheckPoint를 이용하여 영속성(Durability) 보장
Transaction Log가 없어도 마지막 Checkpoint 시점의 데이터를 복구 가능
하지만 격리 수준을 제공할 뿐 선택해서 사용 못함

SNAPSHOT(REPEATABLE-READ)

MongoDB Wired Storage Engine의 기본 격리수준
Transaction을 시작한 시점 부터 commit / rollback 완료될때까지 자신이 처음 검색한 데이터를 완료될때까지 같은 결과
명시적 Transaction을 지원하지 않으며, 단일 Document에 대해서만 지원

Transaction Commit과 Checkpoint 2가지 형태로 영속성(Durability) 보장
Commit되지 않은 변경 데이터는 공유 캐시 크기보다 작아야 함

Commit 이 되어야만 디스크로 저장하기에 Transaction 내 변경 데이터 사이즈가 공유 캐시 크기보다 작아야 함

DML 발생 시 Lock이 발생하면 외부적으로는 기다리는 것으로 나오지만 내부 로직은 재시도를 함 (WriteConflict Exception -> Error를 발생시켜 재시도를 하는 내부 Exception처리 -> 사용자는 Waiting 한다고 생각하면 됨)

WriteConflict Exception 작업은 cpu 사용량을 높이며, db.serverStatus() 명령으로 확인 가능
db.serverStatus() 의 writeConflicts 수치가 증가할수록 하나의 Document에 DML시도가 많은 것이므로 로직 변경을 고려해 보자
Find 명령어는 writeConflicts 와는 무관 (X-Lock을 걸지 않으므로)

여러 문장을 하나의 문장으로 명령하더라도, MongoDB 내부에서는 잘게 짤라 각각의 문장으로 처리

db.users.inser({_id:1, name : 'ABC'},{_id:2, name:'DEF'}) -> db.users.insert({_id:1 ,name:'ABC'}) / db.users.insert({_id:2 ,name:'DEF'})
그렇기 때문에 insert 중간에 에러 발생한면, 에러 발생하기 직전까지는 Insert 완료, 이 후의 데이터는 실행 안됨
db.collection.bulkWrite([],{ordered:false}) 로 하여 실행 도중 에러 발생하는 경우 데이터 저장 관련 확인하기 힘듦

한 문장에 여러 Document가 변경 되더라도(multi:true) 내부에서는 한건씩 update 가 진행(중간에 에러 발생해도 롤백 안됨)
데이터 읽기의 경우 건건이가 아닌 일정 단위로 트랜잭션을 시작하고 완료(스냅샷 유지하여 읽음)

일관성에 문제가 발생할 수 있음
스냅샷 유지 요건(해당 요건이 모두 충족되어야 스냅샷 해제 및 삭제, 새로운 스냅샷 생성)

쿼리가 지정된 건수의 Document를 읽은 경우 (internalQueryExecYieldIterations = 128)

Document 128건을 읽은 경우

쿼리가 지정된 시간 동안 수행된 경우 (internalQueryExecYieldPeriodMS = 10)

10 밀리 세컨드 이상을 쿼리가 실행되는 경우

대량의 데이터를 읽는 경우 해당 데이터가 버퍼에 존재(스냅샷에서)하며, 그것을 가져와서 보여주기 때문에, find하는 도중에 보여준 데이터를 삭제 하더라도 그 전에 find한 데이터는 계속 보여줌

정렬을 사용하게 되면 정렬 버퍼에 해당 데이터가 적재된 후, 적재된 데이터를 보여줌
커서에 정렬된 데이터를 남겨둔 채 (모든 데이터를 클라이언트에 안보내고 중단하는 경우) 커서를 닫지 않으면, 메모리 누수가 발생 가능성(MongoDB에서 자동으로 닫기 전까지)
원하는 데이터를 모두 봤다면 반드시 커서를 닫아 주도록 하자

(Transaction-level) Write Concern

사용자의 변경 요청에 응답하는 시점을 결정하는 옵션을 Write Concern
DML 상황에서만 적용(Update / Delete / Insert에 대해서)
Client, Database, Collection 레벨에서 Write Concern을 결정 가능
단일 노드 동기화 제어

4가지가 존재 옵션 존재
UNACKNOWLEDGED

클라이언트가 MongoDB서버로 전송된 명령이 정상적으로 처리 됐는지 아니면 에러가 발생해서 중간에 작업이 멈췄는지 관심 없이 무시
에러 여부를 전혀 알수가 없음
실무에서는 거의 사용하지 않음

ACKNOWLEDGED (최근 MongoDB의 Default)

메모리상에 변경된 데이터의 적재 여부를 통보
다른 Session 또는 자신의 Session에서 변경 값 조회 시 변경값 확인 가능
메모리에서 Disk로 적재 하기 전에 장애 발생 시 손실의 위험 존재

JOURNALED

General Log 까지(Disk) 작성되어야 결과 반환 (General Log 활성 여부가 먼저 선행으로 체크해야 사용 가능함 / 3.6에서 부터는 반드시 General Log가 활성화 됨-비활성화 불가)
장애 발생하더라도 언제든 복구가 가능
레플리카 셋을 사용하는 경우 단일 MongoDB서버를 사용할 때에는 발생하지 않는 새로운 문제가 발생(동기화가 안되는 경우 문제 발생 가능성을 사전에 알기 힘듦)
3.6에서 부터는 General Log 가 무조건 사용되기에 JOURNALED를 사용 가능하지만, ACKNOWLEDGED 모드와 차이가 없으며 JOURNALED 방식으로 작동)

FSYNC (MMapv1)

General Log가 없던 버전에서 사용했던 방식
디스크의 데이터파일까지 모두 작성된 후에 결과 반환하는 방식으로 높은 비용의 작업
제거가 될 기능

레플리카 셋 간의 동기화 제어

Primary가 장애 발생 시 Secondary들에게 최신 OpLog를 전달하지 못한 경우 롤백되어 손실이 발생할 수 있는 점을 보완하기 위하여 설정 제어 방식
"{w:?}" 으로 Option을 줄 수 있으며, ? 에는 숫자 또는 문자가 들어가는데 해당 값에 따라 방식이 변경
{w:2}

3개의 멤버 중 2개(Primary + Secondary 1) 의 멤버가 변경 요청을 필요한 수준까지 처리 했을 때 성공, 실패 여부를 반환
ACKNOWLEDGED + {w:2} (기본이라 생략도 가능)
JOURNALED + {w:2}

{w:"majority"}

위의 숫자를 작성하게 되면 멤버수가 변경될 때마다 변경해 줘야함(하드 코딩)
"majority" 로 작성하게 되면 멤버수의 상관 없이 과반 수 이상일 경우 결과 반환
Read할 때에도 멤버 서로의 OpLog의 적용 위치를 알기 때문에, majority 로 하면 OpLog가 적용이 덜 된 곳의 멤버에는 접근하지 않음

"Tag-set name"

각 멤버들에게 Tag를 할당 가능하며, 해당 Tag를 가진 멤버에게 체크

(Transaction-level) ReadConcern (https://docs.mongodb.com/manual/core/transactions/#std-label-transactions-write-concern)

Replica 간 동기화 과정 중에 데이터 읽기를 일관성 있게 유지할 수 있도록 MongoDB서버에서 옵션으로 제공(일관성 목적)
Option에 따라 다르겠지만, 쿼리 실행 시 기본적으로 세컨드리 멤버들이 OpLog를 원하는 옵션 수준 까지 동기화될 때까지 기다린다
3가지 옵션이 존재( local, majority, linearizable )하며 Client Level, Database Level, Collection Level에서 설정 가능
Local : Default 로 가장 최신의 데이터를 반환하는 방식. 주로 Primary에서 가져가며, 장애 발생 시 해당 데이터가 롤백되어 Phantom Read가 발생 가능성 존재

Local 을 제외한 나머지 옵션에 대해서는 기본적으로 maxTimeMS 설정을 권장
4.4 fCV 가 적용된 경우 Transaction 내에서 Index를 생성할 수 있으며, 명시적으로 생성하는 경우 local 로 설정하여 진행해야 함

majority : 다수의 멤버들이 최신 데이터를 가지고 있는 경우에 읽기 결과가 반환, 모든 DB가 장애 발생 시 데이터가 롤백되어 Phantom Read가 발생 가능성이 존재하지만, 롤백으로 인해 사라지는 경우 가능성이 가장 낮음

MongoDB --enableMajorityReadConcern .........

또는 MongoDB.conf 에서

...
setParameter :
   enableMajorityReadConcern: true
...

snapshot

linearizable : 모든 멤버가 가진 변경 사항에 대해서만 쿼리 결과 반환. 모든 DB가 장애가 발생하더라도 롤백은 일어나지 않음. 3.4부터 지원. 쿼리 응답시간이 자연스럽게 지연될 가능성이 존재. 무재한 기다릴 수 있기 때문에 쿼리 타임 아웃 설정은 필수.

db.runCommand({ find:"users", filter: {name:"matt"}, readConcern: {level: "linearizable"}, maxTimeMS:5000 })

Read Preference

클라이언트의 쿼리를 어떤 DB 서버로 요청해서 실행할 것인지 결정하는 옵션(분산 목적)
서버에 접속하는 시점에 설정되기에 컨넥션 생성하는 즉시 Read Preference 옵션을 설정
Find쿼리만 영향이 미치며 5가지가 존재

db.getMongo().setReadPref('primaryPreferred')
db.users.find({name:"matt"}).readPref('primaryPreferred')

Primary (Default)

Primary로만 쿼리 실행하며, Primary가 없으면 쿼리 실행은 실패(장애 발생하고 fail over 이전)

PrimaryPreferred

가능하면 Primary로 전송하며 장애로 Primary가 없는 경우 Secondary 로 요청

Secondary

Secondary 멤버로만 전송하며, Primary로는 요청하지 않음. 멤버가 2개 이상일 경우 적절히 분산하여 요청
Secondary가 없는 경우 실패 발생

secondaryPreferred

Secondary와 동일하지만 없는 경우 Primary로 요청

nearest

쿼리 요청 응답시간이 가장 빠른 멤버로 요청 (Primary, Secondary 고려 안함)
동일한 대역에서는 미흡하지만, 레플리카 셋이 글로벌하게 분산되어 멤버들간의 응답시간이 차이가 나는 경우 적절

샤딩 환경의 중복 Document 처리

샤딩된 클러스터는 2가지 경우에만 Document 샤드의 소유권(Ownership)을 체크

쿼리에 샤드의 메타 정보 버전이 포함된 경우
쿼리의 조건에 샤드 키가 포함된 경우

청크가 이동될 때마다 컨피그 서버의 메타 정보가 변경됨(메타 정보의 버전이 1씩 증가)
버전 정보를 쿼리 실행시마다 전달하는데, 버전이 포함된 쿼리를 'Versioned Query'라고 명칭

청크가 이동되더라도 한 시점의 일괄된 데이터를 보장하는 이유가 버전 정보를 포함하고 있기 때문
하지만 Primary에서 실행한 경우만 Versioned Query를 사용하며 Secondary에서는 버전 정보를 포함하지 않음(Unversioned Query)

Shard key를 포함한 쿼리의 경우 특정 샤드로만 쿼리가 전달되므로 해당 샤드가 가진 청크에 포함된 Document만 반환
Shard Key가 없으며, Secondary에서 조회하는 경우 Document의 중복이 발생 가능성 존재

[ Multi-Document Transaction]

Mongodb Client API도 반드시 4.2 버전으로 사용
4.4 부터 명시적으로 Transaction 내에서 collection 을 생성 가능(Client API 또한 4.4로 사용해야 가능)
MongoDB 버전 4.0부터 제공되었지만, Replica Sets 환경에서 만 지원되었으며 4.2부터 Shard-Cluster 환경에서도 지원
Multi-Document Transation 은 여러 작업, Collection, DB에 적용할 수 있으며 Transaction 이 Commit 되면 변경된 모든 데이터를 저장하고 Rollback 되면 모든 데이터 변경을 취소
Commit 이 확정될 때까지 변경 중인 데이터는 누구도 참조할 수 없음( dirty_read : no)
Embedded Document 및 Array 구조와 같은 단일-Dcoument Transaction 에 비해 성능지연 문제가 발생할 가능성이 있기 때문에, 이를 대체해서 사용하면 안됨
FeatureCompatibility Version(fCV) 4.0 이후 환경에서 사용할 수 있으며 wiredTiger Storage Engine과 In-Memory Storage Engine에서 만 사용할수 있음(admin DB에서 설정 가능)db.adminCommand({ getParameter:1, featureCompatibilityVersion:1})
db.adminCommand({ setFeatureCompatibilityVersion : "4.2"})
Config, admin, local DB의 Collection 을 읽기/쓰기할 수 없으며 system.* Collection은 쓰기 작업을 수행할 수 없음. 또한 Multi Transaction 이 진행 중인 Collection은 인덱스를 추가 및 삭제할 수 없음
Transaction 총 크기가 16 Mb 이상인 경우에도 필요한 만큼의 Oplog를 생성하여 처리할 수 있음
Transaction 을 진행하기 전에 안전하게 write concern을 Majority 설정하는 것이 좋음
Wired Tiger인 Primary Server와 In-Memory 인 Secondary Server환경에서도 Transaction 을 지원

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB] [Study-12] Authentication & Role 정리 (0)	2021.04.16
[MongoDB] [Study11] 모델링 (0)	2021.04.16
[MongoDB] [Study-9] Index (0)	2021.04.16
[MongoDB][Study-8] Aggregation (0)	2021.04.16
[MongoDB][Study-7] Find / FindAndModify / Cursor (0)	2021.03.28

[MongoDB] [Study-9] Index

2021. 4. 16. 00:43

MongoDB에서 Index의 대소문자는 엄격히 구분
Document를 update할 때 해당 Index Key만 변경되지만 변경되는 Document 크기가 기존 Extent 공간 크기보다 큰 경우 더 큰 Extent 공간으로 마이그레이션 될 수 있기 때문에 성능 저하 현상이 발생 ( 설계 시 Extent 공간도 고려)
Sort() / limit()은 함께 사용하는 것이 성능에 좋음

Index를 활용하면 Sort된 형태로 반환되며 limit을 이용하면 불필요한 검색량을 줄여줄 수 있어서 효과로 예상

reIndex는 기존 Index 제거 후 생성 됨
B-Tree Index

인덱스를 생성 시, 인덱스는 해당 Field의 값을 가지고, Document들을 가르키는 포인터 값으로 이뤄진 B-Tree Index
Field 값 + Document를 가르키는 포인터 값

Collection 생성 시 _id 필드에 유니크 인덱스를 자동 생성(별도의 _id를 명시 하지 않는 것을 추천 / 내부 인덱스 생성)
Cluster Index 개념이 존재하지 않기에 PK나 Secondary Index 나 내부 구조는 동일

Index를 이용해 Document 검색할 때 Index -> Hiden Field(Record-Id) -> Data접근 하여 총 2번의 인덱스 검색
Hiden Field로 생성되는 Record-Id 인덱스가 ClusterIndex 이지만 사용자는 이것을 생성하거나 사용할 수 없기 때문에 ClusterIndex는 없다고 봐야 한다.

인덱스 생성 시 서브Document 를 이용하여 생성은 가능하지만, 인덱스는 압축을 하지 않기에 자칫 인덱스 사이즈가 커지므로 인덱스로써 역효과를 가져올 수 있음
- WT Cache 상태에서는 데이터는 non압축 상태이지만, Index는 메모리 상 뿐만 아니라 디스크에도 prefix 압축된 상태로 사용(메모리의 효율성을 증대 가능)
- SubDocument 에 Index 생성 시 어떻게 find를 할 것인지 반드시 고민 후에 생성해야 하며 순서 또한 보장해야 함
- 서브 도큐먼트의 필드를 모두 가지고 순서가 같을 때만 인덱스 역활을 할 수 있음 (위와 동일)
- 필히 참고 : https://www.percona.com/blog/2020/07/24/mongodb-utilization-of-an-index-on-subdocuments/

인덱스도 prefix 압축 사용
subdocument

Equal 검색 및 앞 부분 일치할 때만 사용 가능하고, 부등호(부정) 에 비교에서는 사용 불가, 필드값 변형이 일어난 것, 함수 적용 등은 사용 못함(rdbms와 동일)

[ 가이드 ]

Equal 검색을 지향합니다.
조건에 많이 사용하는 Field 쿼리에 대해서 선행 Filed 로 하여 Index 생성
Cardinality 가 좋은 Field 에 대해서 Index 를 생성
날짜 검색을 해야 한다면, 가급적이면 _id 를 이용한 Range(날짜 지정 범위-$gte and $lte ) 검색을 지향

추가로 별도의 index 생성 또는 필드 생성에 대한 고민 (_id 로 대체 가능)

Index 도 Disk Size 가 할당되며, 해당 Field 에 대해 수정이 발생 시 마다 변경 되기에 무분별한 Index 생성 및 Size가 큰 Field 에 대한 생성은 지양
검색하는 목적에 맞는 Index 의 Type을 지정 (특별히, Text 또는 지도 상의 거리 등)
서브 다큐먼트 Field에 대해서는 인덱스 생성을 지양(인덱스 사이즈가 작아야 성능에 효율)
2중 배열은 지양 . 필요 시 업데이트 검색 등이 힘듦 -> 체크해 보자

[createIndex]

db.collection.createIndex(keys, options)

Index Name	Index Type	Parameter	Type	Memo
Non-Unique / Unique Index	Option	unique	Bollean	유일값 여부 {unique : true}
Single Field / Compound Index				단일 필드 또는 여러개의 필드를 이용하여 Index 를 생성 > Compound Index 생성 시 선행 필드가 중요하기에 Find / Update 진행 시 자주 사용하는 필드를 선행으로 지정 하면, 여러 쿼리에서 사용할 수 있기에 효율성이 증대
Multi Key Index				배열에 지정된 내용을 인덱스 하기 위함 배열의 모든 요소에 대해 개별의 인덱스 객체들을 자동 생성 > 명시적으로 멀티키 타입을 지정할 필요 없음
Background Index (v4.2에서부터 해당 옵션은 사라지며, default 로 진행-index 생성 시 처음과 마지막에만 Lock)	Option	background	Bollean	Index 생성 시 많은 비용이 발생하여 성능 저하가 발생하는데, 적은 비용으로 백그라운드로 생성하여 DB에서 부하를 줄여 주는 방식 > 생성 진행되는 동안 background 로 생성하기에 메모리 체크가 필수
TTL Index	Option	expireAfterSeconds	Integer	일정 시점이 지난 인덱스 데이터는 자동 삭제해 주는 기능 (지정한 시간동안만 Index 에서 Data가 존재하며, 이후에는 제거) 제한시간을 설정하여 오래된 데이터를 삭제해 주는 기능
Sparse Index	Option	sparse	Bollean	Field 에서 Null 값을 가진 데이터가 대부분이고 드물게 어떠한 값을 가지고 있는 경우 생성하면 효율적 (인덱스의 크기가 줄며, 효율성을 올릴 수 있음 -> Null 값을 제거하고 Index 생성)
Partial Index (v3.2)	Option	partialFilterExpression	Document	Index 생성 시 해당 Field를 Filtering 한 결과를 생성 Ex ) 천만원 이상인 금액만 주로 Find하는 경우 인덱스 생성(해당 금액 이하는 Index 를 사용하지 못함) db.emp.createIndex({salary:1},{partialFilterExpression:{salary:{$gte:10000000}}})
GeoSpatial(2d) Index	Option	bits min max	Integer number Number	공간인덱스 공간내 거리나 좌표를 구할때 사용 지리 좌표 데이터의 효율적인 쿼리를 제공 평면 기하학을 이용해 결과를 제공
GeoSpatial(2dsphere) Index	Option	2dsphereIndexVersion	Integer	지리 좌표 데이터의 효율적인 쿼리를 제공 구체 기하학을 이용해 결과를 제공
GeoHayStack Index	Option	bucketSize	Number
Text Index	Option	weights default_language Language_override textIndexVersion	document String String Integer	하나의 Collection에 하나의 Text Index만 생성 가능
Hashed Index				Hash 기반의 샤딩을 제공하기 위해, Field 값을 Hash 하여 생성 Shard Cluster에서 데이터를 균등하게 분산하고자 할때 사용
Covered Index				여러개의 Field로 생성된 Compound Index 에서 검색할 때 Index 검색 만으로도 조건을 만족하는 Document 를 추출이 가능한 경우 .explain() 확인 시 [...indexOnly:true]를 확인할 수 있음
Wildcard index (v4.2)		wildcardProjection	document	https://docs.mongodb.com/manual/core/index-wildcard/#wildcard-index-core 아래 옵션은 사용 할 수 없음 * Compound * TTL * Text * 2d (Geospatial) * 2dsphere (Geospatial) * Hashed * Unique
collation (v3.4)	Option	collation	Document	주로 강세등이 있는 언어들에 대해서 binary 화 한 후에 비교(French 등) * text indexes, * 2d indexes, and * geoHaystack indexes. 만약 collation option 없이 생성했다가 collation option 넣은 후에 동일하게 생성하려고 해도 생성이 안되며 기존 Index 삭제 후 재생성 해야함

[ Default Indexes : _id]

Collection 생성 시, 별도로 생성하지 않는다면, 기본적으로 _id field에 대해서 Index가 생성

unique Index이며 삭제할 수 없음
PK나 Secondary index나 모두 내부는 동일하기에 Cluster index 개념이 없음

[ Single Field Indexes ]

1개의 필드로 생성된 것을 Single Field Indexes
Index는 생성한 Field를 기준으로 정렬

[ Compound(Composite) Index ]

2개 이상의 필드가 연결된 것을 Compound Indexes
각각 다른 방식으로 정렬하여 생성 가능

[ Multikey Indexes ]

document 내의 document 가 존재하는 embedded document 또는 array 형태의 Field에 Index를 생성

Multikey Index 의 경우 shard key로 사용될 수 없음
shard key는 하나의 chunk로 매핑되어야 하는데, 여러개의 엔트리가 들어 있는 형태로는 불가능
Multi-key index는 커버링 인덱스 처리가 불가능

필드가 array인데 Index 생성하면 자동으로 multikey 인덱스로 생성
Unique Multi-key index는 document 내에서 Unique 가 아닌 Collection 내에서 Unique 함
Compound + Multikey Index에서는 하나의 Multikey 만 포함 가능

또한, 문제없이 compound + Multikey index로 생성되어 있는 경우 그 중 single field에 array 형태로 insert,update 를 시도하면 에러가 발생

ex 1) 아래와 같이 a,b 필드가 array 인 경우 {a:1, b:1} 이런식으로 2개의 compound multikey index를 생성할 수 없다.
{ _id: 1, a: [ 1, 2 ], b: [ 1, 2 ], category: "AB - both arrays" }

ex 2) {a:1, b:1} 이런 형태로 정상적으로 index가 문제 없이 생성되어 있는 collection 형태에서, 하나의 document에 a,b를 array 형태로 변경 시 문제 발생
{ _id: 1, a: [1, 2], b: 1, category: "A array" } <- b를 array 형태로 document 를 추가 수정 시 fail 발생
{ _id: 2, a: 1, b: [1, 2], category: "B array" } <- a를 array 형태로 document 를 추가 수정 시 fail 발생

ex3) compound + multikey index를 여러 array 에 사용하고 싶은 경우 아래와 같이 설계하면 사용 가능 { "a.x": 1, "a.z": 1 }
{ _id: 1, a: [ { x: 5, z: [ 1, 2 ] }, { z: [ 1, 2 ] } ] }
{ _id: 2, a: [ { x: 5 }, { z: 4 } ] }

[ Text Indexes ]

문자열 내용에 대한 텍스트 검색 쿼리를 지원
문자열 또는 문자열 요소의 배열인 모든 필드가 포함
하나의 Collection에 하나의 text index만 생성 가능

하나의 text index 생성 시 Compound로 생성 가능
생성하려는 Field에 text 로 명시 (다른 Index 생성과 다른 방식)

text Index 생성하게 되면 기본적으로 해당 "field명_text" 명으로 생성
$meta 를 이용하여 검색하는 text에 대해 가중치를 제공 가능

mongodb에서 한글에 대해서 ngram, 형태소분석을 기본적으로 제공하지 않고 구분자 기반(공백문자) 기준으로 인덱싱 처리함
한 단어의 부분에 대해서도 검색을 가능하게 하려면 ngram full text index 기능을 사용해야함
참고 : https://sarc.io/index.php/nosql/1769-mongodb-text-index

구분자 기반(공백문자) 기준으로 인덱싱

# text Index 생성 방법
db.array.createIndex({"month_data":"text"})

# Compound 으로 생성 방법
db.reviews.createIndex(
  {
    subject: "text",
    comments: "text"
  }
)

#가중치 $meta 를 이용하여 sort 진행
db.array.find({$text:{$search:"서울"}},{score:{$meta: "textScore"}}).sort({score:{$meta:"textScore"}}).pretty()

#Wildcard Text Index
db.collection.createIndex( { "$**": "text" } )

text Index를 여러개 생성 시 오류

month_data 로 이미 생성 했는데, 추가로 생성 시 오류

가중치 추가하여 진행 ($meta)
- "아앙아" 라는 문자내에서 "아"를 검색하게 되면 가중치는 2가 됨 (아 *2개)

> db.array.find({$text:{$search:"서울"}},{score:{$meta: "textScore"}}).sort({score:{$meta:"textScore"}}).pretty()
텍스트 내에 값을 찾으면 그 찾고자 하는 값의 개수에 곱하기가 되어 계산
참고 : https://sarc.io/index.php/nosql/1769-mongodb-text-index

MongoDB Text index

[{"id":"10","listid":"1","parentid":"0","videosource":"youtube","videoid":"KiwjxNKXfxY","imageurl":"https:\/\/i.ytimg.com\/vi\/KiwjxNKXfxY\/default.jpg,120,90;https:\/\/i.ytimg.com\/vi\/KiwjxNKXfxY\/mqdefault.jpg,320,180;https:\/\/i.ytimg.com\/vi\/KiwjxNKX

sarc.io

[ Wildcard Indexes ]

필드 하위에 $**를 붙여서 이 필드의 하위 모든 문서에 와일드카드 익덱스를 만드는 것
필드 하위 필드들을 한번씩 Scan 각각의 인덱스를 만드는것

문서/배열로 재귀하여 모든 필드에 대한 값을 저장

All fields on Index

각 Document에 대한 모든 Field 를 인덱싱
_id 필드를 생략

wildcardProjection

wildcard 사용시 특정 필드 경로를 포함하거나 제외할 수 있는 옵션.
모든 색인을 작성하는 경우에만 유효

Indexing은 추가로 발견되는 포함된 문서를 계속 탐색

#Wildcard Index / attributes 필드의 하위 document에 인덱스생성

db.collection.createIndex({ "attributes.$**": 1 })

# All fields on Index

db.collection.createIndex( { "$**" : 1 } )

#Wildcard Text Index

db.collection.createIndex( { "$**": "text" } )

[ 2dsphere Indexes ]

[ 2d Indexes ]

[ geoHaystack Indexes ]

[ Hashed Indexes ]

[ Index Properties ]

[ Index Builds on Populated Collections ]

[ Index Intersection ]

2개의 각기 다른 Field를 가진 Index들이 하나의 쿼리에 대해 인덱스를 이용(하나의 쿼리에 2개의 인덱스가 사용)
Equal + Equal로 이루어진 쿼리 의 Index의 경우 AND_SORTED 스테이지가 사용
Equal + Range로 이루어진 쿼리의 Index의 경우 AND_HASHED 스테이지가 사용
Index Intersection이 다른 방법보다 효율적이거나 최신의 알고리즘이 아니기에, 효율적인 최적화가 아니다.
(Index Intersection을 사용한다는 것은 Index가 잘못 설계[각 Filed마다 Single Filed로 인덱스가 여러개 생성] 되어 어쩔수 없이 사용되어지는 방법이기에 인덱스 설계를 다시 고민해야 한다)
인덱스 인터섹션 최적화가 사용되는 경우는 어떤 인덱스로도 최적화하기가 어렵다고 판단되는 경우 사용

[ Manage Indexes ]

[ Measure Index Use ]

[Indexing Strategies ]

[ Indexing Reference ]

[getIndexes()]

Collection 의 Index 정보 가져오기 : db.collection.getIndexes()
listIndexes 라는 권한이 필요 (read role 있으면 가능)

[ Index 사용통계 ]

Index 사용 통계
db.monsters.aggregate([{ $indexStats: {} }]).pretty()
Ops : 0 이므로 한번도 _id 로 검색을 하지 않음

[ explain() ]

Explain 확인
executionStats 모드

인덱스 사용 여부
스캔한 문서들의 수
쿼리의 수행 시간

allPlansExecution 모드

쿼리 계획을 선택하는데 필요한 부분적인 실행 통계
쿼리 계획을 선택하게 된 이유

[ Background Index 생성 ]

MongoDB 인덱스를 생성하는 경우 (Foreground) Collection Lock 이 걸리게 된다.(쿼리 웨이팅 발생, 단 빠르게 생성) , (Session Blocking 발생-순간)
다만, Background Index로 생성하는 경우 Lock을 피할수 있기에 동시 사용성이 증가
Background Index 생성 시 해당 collection으로 session유입 시 Index생성이 잠시 중단 되었다가 완료 되면 다시 시작 (Foreground Index보다 생성 시간이 늦어짐)
Index 생성이 완료 되면 그 때 OpLog에 작성이 되며, 이 것을 받아 Secondary에서도 동일하게 Background로 시작(v2.4의 경우 Secondary에서는 Foreground로 생성 되는 버그. 주의)
Collection의 Document가 많거나 Session 유입이 아주 많다면 세컨드리에서 포그라운드로 먼저 생성 후 프라이머리와 교체하는 것도 방법
RDBMS의 경우 인덱스 생성 시 버퍼 공간을 사용하지만, 몽고 디비의 경우 따로 버퍼를 사용하지 않으며 트랜잭션로그(Undo Log)를 사용하지도 않기 때문에 오래 걸릴수도 있지만, 반대로 단순하기 때문에 버퍼 영역으로 인해 실패하거나 DB에 문제를 일으키지 않는 장점
인덱스 삭제의 경우도 메타 정보를 변경하고 인덱스와 연관된 데이터 파일만 삭제하면 되므로 매우 빠르게 진행(점검을 걸거나 할 필요 없음, 하지만 한순간 데이터베이스 잠금을 필요로 하므로 쿼리 처리량 낮은 시점에 삭제하는 것이 좋음)
Background Index 생성 시, DB가 재시작(인덱스 빌드 프로세스를 강제 종료) 된다면 Index도 DB가 시작되면서 다시 시작하게 됨. 이 때 Foreground로 시작. 이 때 indexBulidRetry 옵션을 False로 설정하면 막을 수 있음
생성되는 지 체크하기 위해서 MongoDB Log나 OpLog를 통해 세컨더리에 생성 되었는지 확인 하면 됨

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB] [Study11] 모델링 (0)	2021.04.16
[MongoDB] [Study-10] Lock & Transactions (0)	2021.04.16
[MongoDB][Study-8] Aggregation (0)	2021.04.16
[MongoDB][Study-7] Find / FindAndModify / Cursor (0)	2021.03.28
[MongoDB] [Study-6] MongoDB CRUD 쓰기 연산 (0)	2021.03.28

[MongoDB][Study-8] Aggregation

2021. 4. 16. 00:14

Document를 집계하는 방법

DB의 모든 정보를 불러와 Application 단계에서 집계하는 방법

네트워크 프로토콜을 통해MongoDB 외부로 정보를 넘겨야 하기 때문에 모든 Document로 인해 메모리 적재 + 네트워크 비용 발생

MongoDB의 맵-리듀스 기능 이용

Java Script 엔진과 정보 교환을 위해 메모리 사용 (BSON to Javascript로 변환)

사용자 쿼리에 대한 결과 필드를 가져와, 별도의 Thread에서 JavaScript를 실행(Aggregation도 비슷하지만, JS Thread가 아닌 C++로 실행되어 훨씬 빠름)

JSMode라는 설정을 사용하면 속도는 빨라지지만, 대량의 메모리가 필요

JSMode : false

Javascript 엔진에서 다시 MongoDB내부로 정보를 보내는 과정이 추가

JSMode : true

Javascript 엔진 내에서 Document를 grouping 하는 작업을 수행

Reduce 작업

그룹핑된 Document 내에서 연산을 진행

JS Engine에서 수행

자유도가 높아, MongoDB 내부에서 아직 유지중
단, 다양한 집계 파이프라인의 명령어가 추가되면서 맵-리듀스 방식보다 더 많이 사용(Aggregation을 사용 못하는 경우 사용)

MongoDB의 집계 파이프라인 기능을 이용

Document를 순차적으로 받아서 집계 처리를 MongoDB 내부에서 진행(적은 메모리로 빠른 속도 성능)

순차적으로 받아서 처리하면 정확한 데이터가 맞을지..?
전부 받아 오는 것은 동일한데, 메모리에 모두 올려서 처리할 것인지, 하나씩 받아서 처리하는 것의 차이?

	Application	Map-Reduce	Pipeline	비고
자유도	좋다	좋다	나쁘다
처리 속도	가장 나쁘다.	보통	가장 좋다	pipeline 이 map-reduce 보다 10배 더 빠름
램 사용량	매우 높음	높음	낮음
처리 위치	App. 내부	자바스크립트 엔진	MongoDB 내부

참고 : https://stackoverflow.com/questions/32131828/mongo-map-reduce-or-aggregate-strategy

https://stackoverflow.com/questions/13908438/is-mongodb-aggregation-framework-faster-than-map-reduce

참고 : https://sysdig.com/blog/mongodb-showdown-aggregate-vs-map-reduce/

Aggreration 이란?

여러 Document 들을 grouping하여 계산하여, 단일 결과를 반환
Aggregation, Map-reduce, 단일 목적 Aggregation이 존재
처리 단계의 출력이 다음 단계의 입력으로 이어지는 형태로 연결된 구조 (linked list)

누적으로 계산하는 형태

주요 파이프라인

출처 : "맛있는 MongoDB"

git clone https://github.com/Karoid/mongodb_tutorials.git
cd mongodb_tutorials/operating_expenses/

mongoimport -d operatin_expenses -c population --file population.json
mongoimport -d operatin_expenses -c city_or_province --file city_or_province.json
mongoimport -d operatin_expenses -c operating_expenses -c local --file local.json

Stage	설명	형식
$project	어떤 필드를 숨기고, 어떤 필드를 새로 만들지 정하는 역할	{$project:{필드 : bollean}} {$project:{필드 : expression}}
$group	_id 값으로 지정된 내용이 같은 Document끼리 그룹화	{$group: {_id:expression, field1:{accumulator1:expression1},...}}
$match	Document를 필터링해서 반환. find문과 비슷한 역할	{$match:{쿼리}}
$unwind	입력 Document에서 배열 필드를 분해하여 각 요소에 대한 Document로 분리하여 출력	{$unwind:필드경로} { $unwind : { path:필드경로, includeArrayIndex:문자, preserveNullAndEmptyArrays: bollean } }
$out	파이프라인의 결과를 Collection에 기록	{ $out: "컬렉션명"}

$group

연산자	설명	예시
$first	그룹의 첫 번째 값을 반환. $sort 를 해야 의미가 있음	{ $group : { _id:"$그룹대상", 원하는필드명:{$first:"$값필드"} } } (max와 동일하다고 생각했지만 결과 값이 살짝 다름) db.population.aggregate({$group:{_id:"$city_or_province",population:{$first:"$population"}}})
$last	그룹의 마지막 값을 반환. $sort 를 해야 의미가 있음	{ $group : { _id:"$그룹대상", 원하는필드명:{$last:"$값필드"} } } (min와 동일하다고 생각했지만 결과 값이 살짝 다름) db.population.aggregate({$group:{_id:"$city_or_province",population:{$last:"$population"}}})
$max	그룹에서 해당 필드의 최대 값을 반환	{ $group : { _id:"$그룹대상", 원하는필드명:{$max:"$값필드"} } } db.population.aggregate({$group:{_id:"$city_or_province",population:{$max:"$population"}}})
$min	그룹에서 해당 필드의 최소 값을 반환	{ $group : { _id:"$그룹대상", 원하는필드명:{$min:"$값필드"} } } db.population.aggregate({$group:{_id:"$city_or_province",population:{$min:"$population"}}})
$avg	그룹에서 해당 필드의 평균 값을 반환	{ $group : { _id:"$그룹대상", 원하는필드명:{$avg:"$값필드"} } }
$sum	그룹에서 해당 필드의 합산 값을 반환	{ $group : { _id:"$그룹대상", 원하는필드명:{$sum:"$값필드"} } } db.population.aggregate({$group:{_id:"$city_or_province",population:{$sum:"$population"}}})
$push	그룹에서 해당 필드의 모든 값을 배열에 넣어 반환. 중복을 제거하지 않음	{ $group : { _id:"$그룹대상", 원하는필드명:{$push:"$값필드"} } } db.population.aggregate({$group:{_id:"$city_or_province",population:{$push:"$population"}}})
$addToSet	그룹에서 해당 필드의 모든 값을 배열에 넣어 반환. 반환하는 배열에 중복된 요소가 없음	{ $group : { _id:"$그룹대상", 원하는필드명:{$addToSet:"$값필드"} } } db.population.aggregate({$group:{_id:"$city_or_province",population:{$addToSet:"$population"}}})

샘플 데이터

$match

find 명령어와 비슷

#rating 이 4보다 크거나 같은 내역, id들을 배열의 형태로 정리하도록 명령($push)

> db.rating.aggregate([
{$match: {rating: {$gte:4}}}
, {$group: {_id: "$rating", user_ids:{$push:"$user_id"}}}
])

$unwind

하나의 Document에 들어있는 배열 요소들을 각각의 Document에 하나의 값으로 갖도록 만드는 작업

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB] [Study-10] Lock & Transactions (0)	2021.04.16
[MongoDB] [Study-9] Index (0)	2021.04.16
[MongoDB][Study-7] Find / FindAndModify / Cursor (0)	2021.03.28
[MongoDB] [Study-6] MongoDB CRUD 쓰기 연산 (0)	2021.03.28
[MongoDB] [Study-Break] Cursor 간략한 정리 (0)	2021.03.27

[MongoDB][Study-7] Find / FindAndModify / Cursor

2021. 3. 28. 22:45

샘플 데이터

"맛있는 MongoDB"

샘플 데이터 공유

$ git clone https://github.com/Karoid/mongodb_tutorials.git
$ ec2-user@mongodb:~$ cd mongodb_tutorials/car_accident/

ec2-user@mongodb:~/mongodb_tutorials/car_accident$ mongoimport -d car_accident -c area --file area.json
2021-03-02T13:29:00.380+0000    connected to: localhost
2021-03-02T13:29:00.403+0000    imported 228 documents
ec2-user@mongodb:~/mongodb_tutorials/car_accident$
ec2-user@mongodb:~/mongodb_tutorials/car_accident$ mongoimport -d car_accident -c by_month --file by_month.json
2021-03-02T13:29:32.252+0000    connected to: localhost
2021-03-02T13:29:32.309+0000    imported 227 documents
ec2-user@mongodb:~/mongodb_tutorials/car_accident$
ec2-user@mongodb:~/mongodb_tutorials/car_accident$ mongoimport -d car_accident -c by_road_type --file by_road_type.json
2021-03-02T13:29:50.266+0000    connected to: localhost
2021-03-02T13:29:50.309+0000    imported 227 documents
ec2-user@mongodb:~/mongodb_tutorials/car_accident$
ec2-user@mongodb:~/mongodb_tutorials/car_accident$ mongoimport -d car_accident -c by_type --file by_type.json
2021-03-02T13:30:08.372+0000    connected to: localhost
2021-03-02T13:30:08.408+0000    imported 687 documents

> show dbs
admin        0.000GB
car_accident  0.000GB
config        0.000GB
local        0.000GB
test          0.000GB
>
> use car_accident
switched to db car_accident
> show tables
area
by_month
by_road_type
by_type

Find 관련 명령어에는 여러 개가 있으며, 그 중에 많이 사용하는 위주로 가이드 하며, 필요 시 추가 가이드 진행 예정

명령어	내역
find()	검색
findAndModify()	검색 후 수정 Update, upset, remove 모두 가능 new : true를 설정하여 update 이후 값을 리턴 new : false 또는 미적용 시 update 이전 값을 리턴 db.monsters.findAndModify({ query: { name: "Dragon" }, update: { $inc: { att: 1000 } ,$set :{"name":"Dragon","hp":4000,"att":1000}}, upsert: true, new : true })
findOne()	한건만 검색
findOneAndDelete()	한건만 검색 후 삭제
findOneAndReplace() > v3.2	한건만 검색 후 변경 returnNewDocument : true 설정하여 변경 전후 확인 가능 Replace 와 Update의 경우 Update는 명시한 필드만 변경 되지만, Replace의 경우는 명시한 필드 변경 외에는 나머지 필드는 모두 삭제 됨 가급적이면 Update만 사용 해야함
findOneAndUpdate() > v3.2	한건만 검색 후 변경 returnNewDocument : true 설정하여 변경 전후 확인 가능

[Find]

Find명령어 사용 시 필요한 filed 명을 사용하여 검색하시기 바랍니다.(Covered Query)
Ex) db.bios.find( {조건}, {_id:0, name:1 , money:1})
- 쿼리가 요구하는(리턴되는) 내용의 모든 필드가 하나의 Index에 포함되어 있으므로, Index만 조회하기 때문에 Document를 조회하는 것보다 훨씬 빠름(일반적으로 인덱스 키는 RAM에 적재되 있거나 디스크에 순차적으로 위치 하기 때문)
- _id 필드를 0으로 명시하지 않는 경우, 결과 값에서 표현이 되기 때문에, covered query 조건이 되지 않으므로, 반드시 _id 필드를 0으로 명시
- 쿼리에 필드가 없거나 필드의 값이 null 이 되면 안된다. i.e. {"field" : null} or {"field" : {$eq : null}}
- v3.6 부터 embedded document의 경우도 적용 (이전 버전 에서는 불가능?)
- 제한

Covered Query

field 명시 방법

필드에 _id 여부는 항상 명시해야 함(_id를 쿼리 결과 내에서 표현하거나 표현하지 않는 것에 대해 명시를 반드시 해야 함)
_id에 대해서만 혼용 사용 가능하며, 다른 필드들에 대해서는 보여 주고 싶은 것에 대해서만 1로 명시, 0으로 명시하면 에러 발생
Ex ) > db.thing.find( { }, {_id:0, empno: 1} ) // empno만 표시하고, _id 및 다른 필드는 표시 안함. // 여기서 중요한건 _id는 항상 명시해 줘야 하며, 보고 싶은 필드만 1로 설정해서 표시 // 다른 필드의 경우 birth 필드가 있더라도 birth:0 으로 하면 에러 발생...왜???모르겠음
만약 empno만 빼고 다 보고자 하면 그때는 > db.thing.find( { }, {empno: 0} ) 이런 식으로 표시
보고 싶은 field가 있다면, field를 명시할 때 보겠다는 field 들만 명시를 하고,field 명시 여부를 혼용해서 명시하게 되면, 에러가 발생
반대로 보고 싶지 않은 field가 있다면 명시 안하겠다는 field 들만 명시를 해야 함

[Filed]

Covered Query Test

# semester 에 대한 Index 확인
> db.employee.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "v" : 2,
                "key" : {
                        "semester" : 1
                },
                "name" : "semester_1"
        }
]

# _id 필드를 명시적으로 0 처리 하지 않으면, 자연스럽게 _id 필드도 같이 표시되기 때문에 명시적으로 _id 필드를 표시 안한다고 해야, Convered query 적용 여부를 확인 가능
> db.employee.find({semester:3},{_id:0, semester:1}).explain()

# 필드를 표시 안하게 되는 경우 _id 필드가 표현 되기에 covered query가 되지 않음
> db.employee.find({semester:3},{"field" : null}).explain()

# covered index 확인

# 필드를 표시 안 하게 되는 경우 _id 필드가 표현 되기에 covered query가 되지 않음 (단순 인덱스 스캔 - fetch 진행)

#실패 - embedded Document 에 대한 covered query 실패
> db.inventory.insertMany( [
    { item: "journal", instock: [ { warehouse: "A", qty: 5 }, { warehouse: "C", qty: 15 } ] },
    { item: "notebook", instock: [ { warehouse: "C", qty: 5 } ] },
    { item: "paper", instock: [ { warehouse: "A", qty: 60 }, { warehouse: "B", qty: 15 } ] },
    { item: "planner", instock: [ { warehouse: "A", qty: 40 }, { warehouse: "B", qty: 5 } ] },
    { item: "postcard", instock: [ { warehouse: "B", qty: 15 }, { warehouse: "C", qty: 35 } ] }
]);
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("603cf504850313ab26e27ced"),
                ObjectId("603cf504850313ab26e27cee"),
                ObjectId("603cf504850313ab26e27cef"),
                ObjectId("603cf504850313ab26e27cf0"),
                ObjectId("603cf504850313ab26e27cf1")
        ]
}
>
> db.inventory.createIndex({item:1, "instock.qty":1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
# 데이터 확인
> db.inventory.find({item:"journal", "instock.qty":5}, {_id:0,item:1, "instock.qty":1})
{ "item" : "journal", "instock" : [ { "qty" : 5 }, { "qty" : 15 } ] }

# covered 체크
> db.inventory.find({item:"journal", "instock.qty":5}, {_id:0,item:1, "instock.qty":1}).explain()

# 단일로 진행 실패
> db.inventory.insertOne({ item: "test_item", instock: [{warehouse: "A", qty: 5}]})
> db.inventory.find({item:"test_item", "instock.qty":5}, {_id:0,item:1, "instock.qty":1}).explain()

# 단일로 진행 실패 2
> db.inventory.insertOne({ item: "test_item_2", instock: [{qty: 5}]})
> db.inventory.find({item:"test_item_2", "instock.qty":5}, {_id:0,item:1, "instock.qty":1}).explain()

비교(Comparison) 연산자

operator	설명
$eq	(equals) 주어진 값과 일치하는 값	find({ 대상필드:{원하는연산자:조건값}})
$gt	(greater than) 주어진 값보다 큰 값
$gte	(greather than or equals) 주어진 값보다 크거나 같은 값
$lt	(less than) 주어진 값보다 작은 값
$lte	(less than or equals) 주어진 값보다 작거나 같은 값
$ne	(not equal) 주어진 값과 일치하지 않는 값
$in	주어진 배열 안에 속하는 값	반드시 배열 형태로 질의 find({필드:{$in / $nin :[ { 원하는값 A } , { 원하는값 B } ] } })
$nin	주어진 배열 안에 속하지 않는 값

논리 연산자

operator	설명
$or	주어진 조건중 하나라도 true 일 때 true	배열 형태로 질의 진행 find({$or / $and / $not / $nor : [{조건A},{조건B}]})
$and	주어진 모든 조건이 true 일 때 true
$not	주어진 조건이 false 일 때 true
$nor	주어진 모든 조건이 false 일때 true

> db.inventory.find({"instock.qty":{$eq:5}})
{ "_id" : ObjectId("603cf504850313ab26e27ced"), "item" : "journal", "instock" : [ { "warehouse" : "A", "qty" : 5 }, { "warehouse" : "C", "qty" : 15 } ] }
{ "_id" : ObjectId("603cf504850313ab26e27cee"), "item" : "notebook", "instock" : [ { "warehouse" : "C", "qty" : 5 } ] }
{ "_id" : ObjectId("603cf504850313ab26e27cf0"), "item" : "planner", "instock" : [ { "warehouse" : "A", "qty" : 40 }, { "warehouse" : "B", "qty" : 5 } ] }
{ "_id" : ObjectId("603cf675850313ab26e27cf2"), "item" : "test_item", "instock" : [ { "warehouse" : "A", "qty" : 5 } ] }
{ "_id" : ObjectId("603cf6d7850313ab26e27cf3"), "item" : "test_item_2", "instock" : [ { "qty" : 5 } ] }

# county 가 종로구,중구 또는 population 이 3,552,490 보다 큰 경우
> db.area.find({$or:[{county: "종로구"},{county:"중구"},{population:{$gte:3552490}}]})

{ "_id" : ObjectId("5c88f9f70da47a8507752775"), "city_or_province" : "서울", "county" : "종로구", "population" : 152737 }
{ "_id" : ObjectId("5c88f9f70da47a8507752776"), "city_or_province" : "서울", "county" : "중구", "population" : 125249 }
{ "_id" : ObjectId("5c88f9f70da47a850775278e"), "city_or_province" : "부산", "county" : "중구", "population" : 45208 }
{ "_id" : ObjectId("5c88f9f70da47a8507752830"), "city_or_province" : "대구", "county" : "중구", "population" : 79712 }
{ "_id" : ObjectId("5c88f9f70da47a850775283f"), "city_or_province" : "인천", "county" : "중구", "population" : 115249 }
{ "_id" : ObjectId("5c88f9f70da47a8507752848"), "city_or_province" : "대전", "county" : "중구", "population" : 252490 }
{ "_id" : ObjectId("5c88f9f70da47a850775284c"), "city_or_province" : "울산", "county" : "중구", "population" : 242536 }

# county 가 종로구,중구 가운데, population 이 125,249 보다 큰 경우 (and / or 연산이 공존)
> db.area.find({$and:[{$or:[{county: "종로구"},{county:"중구"}]},{population:{$gte:125249}}] })

{ "_id" : ObjectId("5c88f9f70da47a8507752775"), "city_or_province" : "서울", "county" : "종로구", "population" : 152737 }
{ "_id" : ObjectId("5c88f9f70da47a8507752776"), "city_or_province" : "서울", "county" : "중구", "population" : 125249 }
{ "_id" : ObjectId("5c88f9f70da47a8507752848"), "city_or_province" : "대전", "county" : "중구", "population" : 252490 }
{ "_id" : ObjectId("5c88f9f70da47a850775284c"), "city_or_province" : "울산", "county" : "중구", "population" : 242536 }

# county 가 종로구,중구 가운데, population 이 125,249 보다 큰 경우 ($and / $in 연산이 공존)
> db.area.find({$and:[{county:{$in:["종로구","중구"]}}, {population:{$gte:125249}}] })

{ "_id" : ObjectId("5c88f9f70da47a8507752775"), "city_or_province" : "서울", "county" : "종로구", "population" : 152737 }
{ "_id" : ObjectId("5c88f9f70da47a8507752776"), "city_or_province" : "서울", "county" : "중구", "population" : 125249 }
{ "_id" : ObjectId("5c88f9f70da47a8507752848"), "city_or_province" : "대전", "county" : "중구", "population" : 252490 }
{ "_id" : ObjectId("5c88f9f70da47a850775284c"), "city_or_province" : "울산", "county" : "중구", "population" : 242536 }

$regex 연산자

$regex 연산자(정규표현식)를 이용하여, Document를 찾을 수 있음

{ <field>: { $regex: /pattern/, $options: '<options>' } }
{ <field>: { $regex: 'pattern', $options: '<options>' } }
{ <field>: { $regex: /pattern/<options> } }
{ <field>: /pattern/<options> }

4번쨰 라인 처럼 $regex 를 작성하지 않고 바로 정규식을 쓸 수도 있으며, $options 정보 를 이용도 가능

option	설명
i	대소문자 무시
m	정규식에서 anchor(^) 를 사용 할 때 값에 \n 이 있다면 무력화
x	정규식 안에있는 whitespace를 모두 무시
s	dot (.) 사용 할 떄 \n 을 포함해서 매치

정규식 test_item_[1-2] 에 일치하는 값이 item 조회

조회하는 데이터의 "" 사용 안함.

> db.inventory.find({item : /test_item_[1-2]/})
{ "_id" : ObjectId("603cf6d7850313ab26e27cf3"), "item" : "test_item_2", "instock" : [ { "qty" : 5 } ] }

#대소문자 무시 i 옵션 이용
> db.inventory.find({item : /Test_item_[1-2]/i})
{ "_id" : ObjectId("603cf6d7850313ab26e27cf3"), "item" : "test_item_2", "instock" : [ { "qty" : 5 } ] }

$text 연산자

Text 검색으로 단어 형태만 검색이 가능 (띄어쓰기 단위 -> 가령 한 단어 내에 포함된 것은 검색이 안됨)
대 소문자 구분 안함
각 나라별 언어에 맞춰 검색이 지원되고 있지만, 한글은 지원되지 않음.
문자열 인덱스를 만들어야 사용 가능

해당 Collection 의 텍스트 인덱스 안에서만 작동하기 때문

필드	설명
$search	검색하려는 내용을 담는다. 구절로 설정되지 않으면 띄어 쓴 단어를 포함한 모든 Document반환
$language	Option. 검색하는 언어를 설정 MongoDB가 지원하는 언어를 설정할 수 있으며, 설정되지 않으면 인덱스에 설정된 내용을 따름
$caseSensitive	Option. Bollean 값. 문자의 대,소문자 구분을 결정하며, Default는 구분하지 않음(False)
$diacriticSensitive	Option. Bollean 값. e`와 e 와 같이 알파벳의 위아래에 붙이는 기호를 무시할지를 정함. Default로는 false (무시 하지 않음)

> db.inventory.createIndex({ item:"text" })

> db.inventory.find( {$text:{$search:"test"}} )
{ "_id" : ObjectId("603e498748883f8cbbc18bd9"), "item" : "keep Test" }
{ "_id" : ObjectId("603e47f148883f8cbbc18bd7"), "item" : "keep test" }
{ "_id" : ObjectId("603e47e248883f8cbbc18bd6"), "item" : "test keep" }

문자열 내 test 를 검색하는데, test_item 같은 것은 검색이 안됨. keep test / test keep 같이 띄어쓰기 된 것에 대해서만 검색

$where 연산자

$where 연산자를 통하여 javascript expression 을 사용 가능

#comments field 가 비어있는 Document 조회

> db.articles.find( { $where: "this.comments.length == 0" } )
{ "_id" : ObjectId("56c0ab6c639be5292edab0c4"), "title" : "article01", "content" : "content01", "writer" : "Velopert", "likes" : 0, "comments" : [ ]

$elemMatch 연산자

$elemMatch 연산자는 Embedded Documents 배열을 쿼리할때 사용

# comments 중 "Charlie" 가 작성한 덧글이 있는 Document 조회를 했을때, 게시물 제목과 Charlie의 덧글부분만 읽고싶은 경우
# 이렇게 해보면 의도와는 다르게  Delta 의 덧글도 출력
> db.articles.find(
    {
        "comments": {
            $elemMatch: { "name": "Charlie" }
        }
    },
    {
        "title": true,
        "comments.name": true,
        "comments.message": true
    }
)
{
        "_id" : ObjectId("56c0ab6c639be5292edab0c6"),
        "title" : "article03",
        "comments" : [
                {
                        "name" : "Charlie",
                        "message" : "Hey Man!"
                },
                {
                        "name" : "Delta",
                        "message" : "Hey Man!"
                }
        ]
}

Embedded Document 배열이 아니라 아래 Document의 "name" 처럼 한개의 Embedded Document 경우

> db.users.find({
    "username": "velopert",
    "name": { "first": "M.J.", "last": "K."},
    "language": ["korean", "english", "chinese"]
  }
)
> db.users.find({ "name.first": "M.J."})

Document의 배열이아니라 그냥 배열일 시에는 다음과 같이 Query

> db.users.find({ "language": "korean"})

projection

find() 메소드의 두번째 parameter 인 projection
쿼리의 결과값에서 보여질 field를 정할 대 사용

# article의 title과 content 만 조회

> db.articles.find( { } , { "_id": false, "title": true, "content": true } )
{ "title" : "article01", "content" : "content01" }
{ "title" : "article02", "content" : "content02" }
{ "title" : "article03", "content" : "content03" }

$slice 연산자

projector 연산자 중 $slice 연산자는 Embedded Document 배열을 읽을때 limit 설정

# title 값이 article03 인 Document 에서 덧글은 하나만 보이게 출력
$slice 가 없었더라면, 2개를 읽어와야하지만 1개로 제한을 두었기에 한개만 출력

> db.articles.find({"title": "article03"}, {comments: {$slice: 1}}).pretty()
{
        "_id" : ObjectId("56c0ab6c639be5292edab0c6"),
        "title" : "article03",
        "content" : "content03",
        "writer" : "Bravo",
        "likes" : 40,
        "comments" : [
                {
                        "name" : "Charlie",
                        "message" : "Hey Man!"
                }
        ]
}

$elemMatch

위의 elemMatch 는 조건 연산자에서 사용하는 것이며,
해당 elemMatch 는 필드 내, 즉 projection 파라메터 입니다.

# $elemMatch 연산자를 projection 연산자로 사용하면 이를 구현 가능
# comments 중 "Charlie" 가 작성한 덧글이 있는 Document 중 제목, 그리고 Charlie의 덧글만 조회 (필드 선언 시 다시 한번 더 eleMatch를 진행하게 되면 Delta 의 댓글은 안보임
> db.articles.find(
...     {
...         "comments": {
...             $elemMatch: { "name": "Charlie" }
...         }
...     },
...     {
...         "title": true,
...         "comments": {
...             $elemMatch: { "name": "Charlie" }
...         },
...         "comments.name": true,
...         "comments.message": true
...     }
... )
{ "_id" : ObjectId("56c0ab6c639be5292edab0c6"), "title" : "article03", "comments" : [ { "name" : "Charlie", "message" : "Hey Man!" } ] }

[findAndModify]

Single Document 에 대한 수정하고 반환
반환 된 Document는 수정에 대한 결과가 아님
반환 된 Document 에 수정한 내역을 확인하고 싶으면 new Option을 이용

new 옵션을 true로 하는 경우 변경 후의 결과 값을 보여주며, Default 로 new 옵션을(false) 선언하지 않는 경우 변경 되기 전 값을 출력

# mosters 에서 name이 'Demon' 의 att를 350으로 변경하고 변경한 후 결과 확인
> db.monsters.findAndModify({ query: { name: 'Demon' }, update: { $set: { att: 350 } }, new: true })

db.collection.findAndModify({
    query: <document>,
    sort: <document>,
    remove: <boolean>,
    update: <document or aggregation pipeline>, // Changed in MongoDB 4.2
    new: <boolean>,
    fields: <document>,
    upsert: <boolean>,
    bypassDocumentValidation: <boolean>,
    writeConcern: <document>,
    collation: <document>,
    arrayFilters: [ <filterdocument1>, ... ]
});

> db.monsters.findAndModify({
    query: { name: "Dragon" },
    update: { $inc: { "att": 1000 } ,$set :{"name":"Dragon","hp":4000,"att":1000}},
    upsert: true,
    new : true
})

> db.monsters.findAndModify({
    query: { name: "Dragon" },
    update: {$set :{"hp":2000,"att":2000}},
    upsert: true,
    new : true
})

> db.monsters.findAndModify({
    query: { name: "Dragon" },
    update: {$set :{"hp":3000,"att":2000}},
    upsert: true
})

# upsert / new
> db.monsters.find( {query:{ name: "Dragon_Baby" }} )
> db.monsters.findAndModify({
    query: { name: "Dragon_Baby" },
    update: {$set :{"hp":1000,"att":500}},
    upsert: true,
    new : true
})

# remove
> db.monsters.findAndModify({
    query: { name: "Dragon_Baby" },
    remove: true
})

# 여러개일 경우
> db.monsters.findAndModify({
    query: { name: "Dragon" },
    update: {$set :{"hp":4000,"att":5000}},
    upsert: true
})

new Option : true

upsert: true / new : true

remove : true

new:true
- return 이 없기 때문에 에러
update : {xxxx}
- 당연히 remove 되는데 update 에러 진행
remove 가 정상적으로 진행 시
- 진행 이전 데이터 return 후 삭제

여러 개일 경우

역시 한 건만 변경 (update 처럼)

참고 : https://velopert.com/479

Cursor

지난 Mongodb Study-Break 때 제가 공유한 문서로 대체 합니다.
hyunki1019.tistory.com/159

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB] [Study-9] Index (0)	2021.04.16
[MongoDB][Study-8] Aggregation (0)	2021.04.16
[MongoDB] [Study-6] MongoDB CRUD 쓰기 연산 (0)	2021.03.28
[MongoDB] [Study-Break] Cursor 간략한 정리 (0)	2021.03.27
[MongoDB][Study-4] MongoDB 기본 명령어 익히기 (0)	2021.03.27

[MongoDB] [Study-6] MongoDB CRUD 쓰기 연산

2021. 3. 28. 22:00

기본 데이터 처리

ObjectID
- 다만 cluster index는 아니며, mongodb 에는 모든 Index 가 Non-cluster Index
- 아래와 같은 방식으로 생성 되기 때문에 Client에서 생성하여 제공?
- 앞에 4byte는 유닉스 시간
- 다음 3byte는 기기의 id 값
- 다음 2byte는 프로세스 id 값
- 마지막 3byte는 랜덤 값부터 시작하는 카운터로 구성
- https://steveridout.github.io/mongo-object-time/

Document Insert 진행 시 _id 를 명시적으로 생성하지 않으면, "_id" 필드가 ObjectID 타입으로 자동 생성
_id 는 서로 겹치지 않는 ObjectID 타입으로 값을 할당 PK라고 생각하면 가능
동시에 생성되어도 서로 다른 값이 생성되어 유일 값(unique)
ObjectID 값은 "유닉스시간+기기id+프로세스id+카운터" 로 구성
ObjectId.getTimestamp() 하면 생성된 시점을 알아낼 수 있음. (ObjectID를 이용하면 시간 range 로 검색이 가능)

> ObjectId("60371e375adcd7d623c78b3d").getTimestamp()
ISODate("2021-02-25T03:49:11Z")

> ObjectId("60371e8d5adcd7d623c78b3e").toString()
ObjectId("60371e8d5adcd7d623c78b3e")

> ObjectId("60371e8d5adcd7d623c78b3e").valueOf()
60371e8d5adcd7d623c78b3e

UUID

출처 : charsyam-[입 개발] Global Unique Object ID 생성 방법에 대한 정리
Universally unique identifier 의 약어로, 16-octet(128bit) 크기의 32개의 Hexa로 표시
OSF에서 표준화(개방 소프트웨어 재단(Open Software Foundation)-유닉스 운영 체제의 일부로 오픈 표준을 만들 목적으로 1984년의 미국 National Cooperative Research and Production Act 하에 1988년에 설립된 비영리 단체)
UUID를 구현하는데는 다양한 방식이 있는데, MAC address 나 HASH(md5, sha-1) 등을 이용한 방식이 유명
MAC address 자체가 unique 하기 때문에, 여기에 현재의 시간을 붙이는 방식으로 구현이 가능

CRUD

Create, Read, Update, Delete
"4주차 기본 명령어 익히기"에서 진행 했던 내용 참고([MongoDB][Study-4] MongoDB 기본 명령어 익히기)
그 중 runCommand 에 대해 추가 정리

runCommand

Tool 에서 많이 보았던 command
지정한 DB에서 도우미 제공해 주는 명령어
runCommand 를 사용하면, 내부 형태만 잡아 준다면 쉽게 접근이 가능

# 일반적으로 사용하는 명령어는 command 에 작성하여 사용 가능
# Default
> db.runCommand(
  {
	  명령어
  }
)

# 관리 administrative 명령어의 경우 아래와 같이 사용 가능
> db.adminCommand( { <command> } )

명령어 종류 (<command> 에 들어가는 명령어)

https://docs.mongodb.com/manual/reference/command/
처음에 해당하는 첫 인자 값에는 실행할 명령어 (Insert, Update, Delete, Find 등)와 실행할 Collection 명 명시

insert

> db.runCommand(
  {
    insert: <collection>,
    documents: [ <document>, <document>, <document>, ... ], // insert 할 내역을 array 형태로 작성
    ordered: <boolean>, // default : true (batch Insert 중 하나라도 실패하면 모든 명령어 실패 / false : Batch Insert 중 하나가 실패하더라도 다음 insert 진행)
    writeConcern: { <write concern> }, // write concern 으로 Transaction 처리하는 경우 명시적으로 설정 하지 말것. - Default 권장
    bypassDocumentValidation: <boolean>, // 유효성 검사로 Enable 하는 경우 유효성 검사를 생략
    comment: // Comment 작성으로 작성하게 되면 mongod log message 에 attr.command.cursor.comment 필드에 작성. (v4.4) -> 찾지 못했습니다. db.adminCommand( { getLog: } ) 참고 : https://docs.mongodb.com/manual/reference/command/getLog/#dbcmd.getLog 참고2 : https://docs.mongodb.com/manual/reference/log-messages/#log-messages-ref
  }
)

# insert Sample
> db.runCommand(
	{
      insert : 'employee'
      , documents:
      [
          {name : 'Hyungi.Kim'
          , Pos : ['dba','devops','develope']
          , wanted:['job','money']
          }
      ]
	}
)

# comment
> db.runCommand(
	{
		insert : 'employee'
		, documents:
		[
          {name : 'Hyungi'
          , Pos : ['dba','devops','develope']
          , wanted:['job','money']
          , memo : 'Add Comment'
          }
		]
	, comment : 'Comment Write by Hyungi.Kim'
	}
)

# inserMany
> db.runCommand(
  {
    insert : 'employee'
    , documents:
    [
      {name : 'Louis'
      , Pos : ['DB Engineer']
      , wanted:['dba']
      } ,
      {name : 'Bong'
      , Pos : ['Student']
      , wanted:['Soccer Player']
      }
    ]
  }
)

> db.runCommand(
  {
  insert : 'students'
  , documents:
    [
      { "_id" : 7, semester: 3, "grades" : [ { grade: 80, mean: 75, std: 8 },
        { grade: 85, mean: 90, std: 5 },
        { grade: 90, mean: 85, std: 3 } ] }
        ,
        { "_id" : 8, semester: 3, "grades" : [ { grade: 92, mean: 88, std: 8 },
        { grade: 78, mean: 90, std: 5 },
      { grade: 88, mean: 85, std: 3 } ] }
    ]
  }
)

update

# update 
> db.runCommand( 
   { 
      update: <collection>, 
      updates: [ 
         { 
           q: <query>,   // q : 검색 해야 하는 내역 name이 Louis 인 것 (where 절)
           u: <document or pipeline>,   // u : 변경해야 하는 내역 (set 절)
           upsert: <boolean>,  // default : false (true로 변경 시 존재 하지 않으면 Insert 진행)
           multi: <boolean>,  // default : false (true로 변경 시 존재하는 모든 데이터에 대해 update 진행)
           collation: <document>, 
           arrayFilters: <array>, 
           hint: <document|string> 
         }, 
         ... 
      ], 
      ordered: <boolean>, 
      writeConcern: { <write concern> }, 
      bypassDocumentValidation: <boolean>, 
      comment: <any> 
   } 
)
> db.runCommand(  
    { 
        update : 'employee' 
        , updates : [ 
            { 
                q: {name : 'Louis'} 
                ,u : [{$set: {Post : ['DBA','Data Engineer'], wated:['Data Engineer']}}] 
            } 
        ] 
    } 
)

> db.runCommand(  
    { 
        update : 'employee' 
        , updates : [ 
            { 
                q: {name : 'DotDot'}  
                ,u : [{$set: {Post : ['DBA','Data Engineer'], wated:['Wanted Girl Friend']}}] 
            } 
        ] 
    } 
)

> db.runCommand(  
    { 
        update : 'employee' 
        , updates : [ 
            { 
                q: {name : 'DotDot'} 
                ,u : [{$set: {Post : ['DBA','Data Engineer'], wated:['Wanted Girl Friend']}}]
                ,upsert: true
            } 
        ] 
    } 
)

> db.runCommand({find:'employee', filter : {Post:'Data Engineer'}})

# multi 를 true로 변경 시 match 되는 모든 데이터가 변경
# 단, 내부에 여러건이 있을 경우 덮어 쓰기 때문에 이 점 유의
> db.runCommand(  
    { 
        update : 'employee' 
        , updates : [ 
            { 
                q: {Post:'Data Engineer'}  
                ,u : [{$set: {Post : ['Data Science']}}] 
                ,multi : true
            } 
        ] 
    } 
)
# multi 를 default (false)로 하는 경우 한 건만 변경
> db.runCommand(  
    { 
        update : 'employee' 
        , updates : [ 
            { 
                q: {Post:'Data Science'}  
                ,u : [{$set: {Post : ['Data Master']}}]  
            } 
        ] 
    } 
)

update

upsert

multi

find

Projection
- 명시하고자 하는 field 를 결정 가능
- 연산자를 사용도 가능

> db.runCommand( 
   { 
      "find": <string>,  // find : collection 명
      "filter": <document>,  // 검색 하고자 하는 내용 (where) - 작성하지 않으면 해당 collection 모두 반환
      "sort": <document>,    // order by 
      "projection": <document>,   // 명시 하고자 하는 컬럼명 (필드 결정) 연산자를 사용 가능 
      "hint": <document or string>, 
      "skip": <int>,  // default : 0 - 건너뛴 Document 개수 이후의 모든 값 리턴
      "limit": <int>,  // default : no-limit / 처음부터 원하는 limit 개수
      "batchSize": <int>, // 배치에서 반환할 문서 수. Default : 101 개
      "singleBatch": <bool>, 
      "comment": <any>, 
      "maxTimeMS": <int>, 
      "readConcern": <document>, 
      "max": <document>, 
      "min": <document>, 
      "returnKey": <bool>, 
      "showRecordId": <bool>, 
      "tailable": <bool>, 
      "oplogReplay": <bool>, 
      "noCursorTimeout": <bool>, 
      "awaitData": <bool>, 
      "allowPartialResults": <bool>, 
      "collation": <document>, 
      "allowDiskUse" : <bool> 
   } 
)

# find 
> db.runCommand(  
    {  
        find : 'employee' 
        , filter : {name : 'Louis'} 
        , projection : {name : 1} 
    }  
) 
> db.runCommand(   
    {  
        find : 'employee'  
        , projection : {name : 1}  
    }  
) 
> db.runCommand(   
    {  
        find : 'employee'  
        , filter : {name : 'Louis'}  
        , projection : {name : 1}  
        , sort : {name : 1} 
        , limit : 1 
    }  
)
$ 테스트
> db.runCommand( 
     { 
         insert : 'employee' 
         , documents: 
             [ 
    { "_id" : 7, semester: 3, "grades" : [ { grade: 80, mean: 75, std: 8 }, 
                                           { grade: 85, mean: 90, std: 5 }, 
...                                        { grade: 90, mean: 85, std: 3 } ] } 
... , 
... { "_id" : 8, semester: 3, "grades" : [ { grade: 92, mean: 88, std: 8 }, 
...                                        { grade: 78, mean: 90, std: 5 }, 
...                                        { grade: 88, mean: 85, std: 3 } ] } 
...             ] 
...     } 
... )
-> grades.mean 이 > 70 중, 만족하는 grades 의 첫번째 배열 값 리턴
> db.students.find(  
   { "grades.mean": { $gt: 70 } }, 
   { "grades.$": 1 }  
)

결과
{ "_id" : 7, "grades" : [ { "grade" : 80, "mean" : 75, "std" : 8 } ] } 
{ "_id" : 8, "grades" : [ { "grade" : 92, "mean" : 88, "std" : 8 } ] }

{배열:{$elemMatch:{원하는필드: 값}}}

원하는 정보가 있으면 배열 내 모든 값이 리턴

{배열:{$slice:원하는 리턴 개수}}
- ex) 댓글의 최대 개수 리턴

delete

capped collections 에서는 동작하지 않음.
그 외는 동일한 동작 방식

> db.runCommand(
    { 
       delete: <collection>,  // 삭제할 collaction 명
       deletes: [ 
          { 
            q : <query>,  // 조건
            limit : <integer>,  // 삭제 개수 필수 (0: 조건에 맞는 모든 데이터 삭제, 1~n : 조건에 맞는 n개 삭제)
            collation: <document>, 
            hint: <document|string>, 
            comment: <any> 
          }, 
          ... 
       ], 
       ordered: <boolean>, 
       writeConcern: { <write concern> } 
    }
)

# limit 을 미 작성 시 오류 발생
> db.runCommand( 
    {  
       delete: "students"
        , deletes : [
            {q: {"_id" : 7} , limit:1}
        ]
    }
)

#대응 하는 모든 데이터 삭제는 limit : 0 으로 설정
> db.runCommand( 
    {  
       delete: "students" 
        , deletes : [ 
            {q: {"semester" : 5} , limit:0} 
        ] 
    } 
)

# And 조건

> db.runCommand( 
    {  
       delete: "students" 
        , deletes : [ 
            {q: {"semester" : 5, "_id":1} , limit:0}  
        ] 
    } 
)

Command Type	Name	Description
Aggregation	aggregate	Performs aggregation tasks such as group using the aggregation framework.
	count	Counts the number of documents in a collection or a view.
	distinct	Displays the distinct values found for a specified key in a collection or a view.
	mapReduce	Performs map-reduce aggregation for large data sets.
Geospatial	geoSearch	Performs a geospatial query that uses MongoDB’s haystack index functionality.
Command	delete	Deletes one or more documents.
	find	Selects documents in a collection or a view.
	findAndModify	Returns and modifies a single document.
	getLastError	Returns the success status of the last operation.
	getMore	Returns batches of documents currently pointed to by the cursor.
	insert	Inserts one or more documents.
	resetError	Deprecated. Resets the last error status.
	update	Updates one or more documents.
Query Plan Cache	planCacheClear	Removes cached query plan(s) for a collection.
	planCacheClearFilters	Clears index filter(s) for a collection.
	planCacheListFilters	Lists the index filters for a collection.
	planCacheSetFilter	Sets an index filter for a collection.
Authentication	authenticate	Starts an authenticated session using a username and password.
	getnonce	This is an internal command to generate a one-time password for authentication.
	logout	Terminates the current authenticated session.
User Management	createUser	Creates a new user.
	dropAllUsersFromDatabase	Deletes all users associated with a database.
	dropUser	Removes a single user.
	grantRolesToUser	Grants a role and its privileges to a user.
	revokeRolesFromUser	Removes a role from a user.
	updateUser	Updates a user’s data.
	usersInfo	Returns information about the specified users.
Role Management	createRole	Creates a role and specifies its privileges.
	dropRole	Deletes the user-defined role.
	dropAllRolesFromDatabase	Deletes all user-defined roles from a database.
	grantPrivilegesToRole	Assigns privileges to a user-defined role.
	grantRolesToRole	Specifies roles from which a user-defined role inherits privileges.
	invalidateUserCache	Flushes the in-memory cache of user information, including credentials and roles.
	revokePrivilegesFromRole	Removes the specified privileges from a user-defined role.
	revokeRolesFromRole	Removes specified inherited roles from a user-defined role.
	rolesInfo	Returns information for the specified role or roles.
	updateRole	Updates a user-defined role.
Replication	applyOps	Internal command that applies oplog entries to the current data set.
	isMaster	Displays information about this member’s role in the replica set, including whether it is the master.
	replSetAbortPrimaryCatchUp	Forces the elected primary to abort sync (catch up) then complete the transition to primary.
	replSetFreeze	Prevents the current member from seeking election as primary for a period of time.
	replSetGetConfig	Returns the replica set’s configuration object.
	replSetGetStatus	Returns a document that reports on the status of the replica set.
	replSetInitiate	Initializes a new replica set.
	replSetMaintenance	Enables or disables a maintenance mode, which puts a secondary node in a RECOVERING state.
	replSetReconfig	Applies a new configuration to an existing replica set.
	replSetResizeOplog	Dynamically resizes the oplog for a replica set member. Available for WiredTiger storage engine only.
	replSetStepDown	Forces the current primary to step down and become a secondary, forcing an election.
	replSetSyncFrom	Explicitly override the default logic for selecting a member to replicate from.
Sharding	addShard	Adds a shard to a sharded cluster.
	addShardToZone	Associates a shard with a zone. Supports configuring zones in sharded clusters.
	balancerCollectionStatus	Returns information on whether the chunks of a sharded collection are balanced. New in version 4.4.
	balancerStart	Starts a balancer thread.
	balancerStatus	Returns information on the balancer status.
	balancerStop	Stops the balancer thread.
	checkShardingIndex	Internal command that validates index on shard key.
	clearJumboFlag	Clears the jumbo flag for a chunk.
	cleanupOrphaned	Removes orphaned data with shard key values outside of the ranges of the chunks owned by a shard.
	enableSharding	Enables sharding on a specific database.
	flushRouterConfig	Forces a mongod/mongos instance to update its cached routing metadata.
	getShardMap	Internal command that reports on the state of a sharded cluster.
	getShardVersion	Internal command that returns the config server version.
	isdbgrid	Verifies that a process is a mongos.
	listShards	Returns a list of configured shards.
	medianKey	Deprecated internal command. See splitVector.
	moveChunk	Internal command that migrates chunks between shards.
	movePrimary	Reassigns the primary shard when removing a shard from a sharded cluster.
	mergeChunks	Provides the ability to combine chunks on a single shard.
	refineCollectionShardKey	Refines a collection’s shard key by adding a suffix to the existing key. New in version 4.4.
	removeShard	Starts the process of removing a shard from a sharded cluster.
	removeShardFromZone	Removes the association between a shard and a zone. Supports configuring zones in sharded clusters.
	setShardVersion	Internal command to sets the config server version.
	shardCollection	Enables the sharding functionality for a collection, allowing the collection to be sharded.
	shardingState	Reports whether the mongod is a member of a sharded cluster.
	split	Creates a new chunk.
	splitChunk	Internal command to split chunk. Instead use the methods sh.splitFind() and sh.splitAt().
	splitVector	Internal command that determines split points.
	unsetSharding	Deprecated. Internal command that affects connections between instances in a MongoDB deployment.
	updateZoneKeyRange	Adds or removes the association between a range of sharded data and a zone. Supports configuring zones in sharded clusters.
Session	abortTransaction	Abort transaction. New in version 4.0.
	commitTransaction	Commit transaction. New in version 4.0.
	endSessions	Expire sessions before the sessions’ timeout period. New in version 3.6.
	killAllSessions	Kill all sessions. New in version 3.6.
	killAllSessionsByPattern	Kill all sessions that match the specified pattern New in version 3.6.
	killSessions	Kill specified sessions. New in version 3.6.
	refreshSessions	Refresh idle sessions. New in version 3.6.
	startSession	Starts a new session. New in version 3.6.
Administration Commands	cloneCollectionAsCapped	Copies a non-capped collection as a new capped collection.
	collMod	Add options to a collection or modify a view definition.
	compact	Defragments a collection and rebuilds the indexes.
	connPoolSync	Internal command to flush connection pool.
	convertToCapped	Converts a non-capped collection to a capped collection.
	create	Creates a collection or a view.
	createIndexes	Builds one or more indexes for a collection.
	currentOp	Returns a document that contains information on in-progress operations for the database instance.
	drop	Removes the specified collection from the database.
	dropDatabase	Removes the current database.
	dropConnections	Drops outgoing connections to the specified list of hosts.
	dropIndexes	Removes indexes from a collection.
	filemd5	Returns the md5 hash for files stored using GridFS.
	fsync	Flushes pending writes to the storage layer and locks the database to allow backups.
	fsyncUnlock	Unlocks one fsync lock.
	getDefaultRWConcern	Retrieves the global default read and write concern options for the deployment. New in version 4.4.
	getParameter	Retrieves configuration options.
	killCursors	Kills the specified cursors for a collection.
	killOp	Terminates an operation as specified by the operation ID.
	listCollections	Returns a list of collections in the current database.
	listDatabases	Returns a document that lists all databases and returns basic database statistics.
	listIndexes	Lists all indexes for a collection.
	logRotate	Rotates the MongoDB logs to prevent a single file from taking too much space.
	reIndex	Rebuilds all indexes on a collection.
	renameCollection	Changes the name of an existing collection.
	setFeatureCompatibilityVersion	Enables or disables features that persist data that are backwards-incompatible.
	setIndexCommitQuorum	Changes the minimum number of data-bearing members (i.e commit quorum), including the primary, that must vote to commit an in-progress index build before the primary marks those indexes as ready.
	setParameter	Modifies configuration options.
	setDefaultRWConcern	Sets the global default read and write concern options for the deployment. New in version 4.4.
	shutdown	Shuts down the mongod or mongos process.
Diagnostic Commands	availableQueryOptions	Internal command that reports on the capabilities of the current MongoDB instance.
	buildInfo	Displays statistics about the MongoDB build.
	collStats	Reports storage utilization statics for a specified collection.
	connPoolStats	Reports statistics on the outgoing connections from this MongoDB instance to other MongoDB instances in the deployment.
	connectionStatus	Reports the authentication state for the current connection.
	cursorInfo	Removed in MongoDB 3.2. Replaced with metrics.cursor.
	dataSize	Returns the data size for a range of data. For internal use.
	dbHash	Returns hash value a database and its collections.
	dbStats	Reports storage utilization statistics for the specified database.
	driverOIDTest	Internal command that converts an ObjectId to a string to support tests.
	explain	Returns information on the execution of various operations.
	features	Reports on features available in the current MongoDB instance.
	getCmdLineOpts	Returns a document with the run-time arguments to the MongoDB instance and their parsed options.
	getLog	Returns recent log messages.
	hostInfo	Returns data that reflects the underlying host system.
	isSelf	Internal command to support testing.
	listCommands	Lists all database commands provided by the current mongod instance.
	lockInfo	Internal command that returns information on locks that are currently being held or pending. Only available for mongod instances.
	netstat	Internal command that reports on intra-deployment connectivity. Only available for mongos instances.
	ping	Internal command that tests intra-deployment connectivity.
	profile	Interface for the database profiler.
	serverStatus	Returns a collection metrics on instance-wide resource utilization and status.
	shardConnPoolStats	Deprecated in 4.4 Use :dbcommand:`connPoolStats` instead. Reports statistics on a mongos’s connection pool for client operations against shards.
	top	Returns raw usage statistics for each database in the mongod instance.
	validate	Internal command that scans for a collection’s data and indexes for correctness.
	whatsmyuri	Internal command that returns information on the current client.
Free Monitoring Commands	setFreeMonitoring	Enables/disables free monitoring during runtime.
Auditing Commands	logApplicationMessage	Posts a custom message to the audit log.

non-CRUD 명령어도 사용 가능

통계 정보 가져오기
복제본 세트 초기화
집계 파이프라인
map-reduce 작업 모두 가능

# mongodb 의 역활 확인 (mster 여부)
> db.runCommand( { isMaster: 1 } )
{ 
         "ismaster" : true, 
        "maxBsonObjectSize" :16777216, 
        "localTime" :ISODate("2013-01-06T19:53:43.647Z"), 
        "ok" : 1 
}

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB][Study-8] Aggregation (0)	2021.04.16
[MongoDB][Study-7] Find / FindAndModify / Cursor (0)	2021.03.28
[MongoDB] [Study-Break] Cursor 간략한 정리 (0)	2021.03.27
[MongoDB][Study-4] MongoDB 기본 명령어 익히기 (0)	2021.03.27
[MongoDB][Study-3] Sharding (0)	2021.03.27

[MongoDB] [Study-Break] Cursor 간략한 정리

2021. 3. 27. 21:46

알면 유용한 것에 대한 고민 끝에 Cursor 에 대해 알아 두면 좋지 않을까 하여 공유 드립니다.

Cursor

쿼리 결과에 대한 포인터
find 명령어는 결과로 Document를 반환하지 않고 Cursor를 반환
Cursor Methods

https://docs.mongodb.com/manual/reference/method/js-cursor/

Name	Description	memo
cursor.addOption()	쿼리 동작을 수정하는 특별한 protocol flag 를 추가
cursor.allowDiskUse()	Sort 명령어를 사용하여, 쿼리 결과 정렬 작업을 처리하는 동안, 100 Mb 메모리를 초과하여 Disk Temporary files를 사용한 내역 -------------------------------------------------------------------------------- MongoDB의 Aggregate() 명령은 기본적으로 정렬을 위해서 100mb 메모리까지 사용 가능. 만약 그 이상의 데이터를 정렬해야 하는 경우라면 Aggregate() 명령은 실패-> 이 때 allowDiskUse 옵션을 true로 설정 시 Aggregate()처리가 디스크를 이용해 정렬 가능. 이 때 MongoDB 데이터 Directory 하의에 "_temp" Diretory 를 만들어 임시 가공용 데이터 파일을 저장	Real MongoDB-702p 참고 $ db.user_scores.aggregate([ {$match:{score:{$gt:50}}}, {$group:{_id:"$name",avg:{$avg:"$score"}}} ],{allowDiskUse:True})
cursor.allowPartialResults()	find 명령어를 사용하여 샤딩된 collection의 작업을 진행 중 오류로 인해 조회를 못하게 되는 경우, 부분적인 결과만 이라도 조회
cursor.batchSize()	Single network 메시지에서, MongoDB에서 Client로 결과를 내보낸 document 수
cursor.close()	cursor close. (리소스까지 비움)
cursor.isClosed()	리턴이 성공하면 cursor를 close
cursor.collation()	find()에 의해 리턴된 커서의 collection 을 지정
cursor.comment()	system.profile Collection에서 로그 및 시스템에서 실행한 쿼리를 추적하기 위해 쿼리에 설명 추가(주석)
cursor.count()	커서에서 결과 Document의 count
cursor.explain()	커서에서 쿼리 실행결과 보고
cursor.forEach()	커서에서 모든 Document에 대한 JavaScript 함수를 적용
cursor.hasNext()	Cursor 내 반환할 Document가 존재하면 True 를 리턴
cursor.hint()	쿼리에 특정 인덱스를 사용
cursor.isExhausted()	커서가 닫혀 있고, 배치에 남아있는 Document가 없는 경우 true를 반환
cursor.itcount()	커서 내 클라이언트로 제공할 Document의 수를 계산 (find().count()와 유사하지만, cursor에서 사용하는 영역?)
cursor.limit()	cursor document 결과 를 제한
cursor.map()	커서 결과 Document를 함수에 적용하고, 그 결과 값을 배열(Array) 형태로 저장	db.users.find().map( function(u) { return u.name; } ); -> forEach와 유사
cursor.max()	find 필드 값에 대한 max 값을 지정. cursor.hint()와 함께 사용
cursor.maxTimeMS()	cursor 작업의 누적 시간 제한 (ms)
cursor.min()	find 필드 값에 대한 min 값을 지정. cursor.hint()와 함께 사용
cursor.next()	cursor 내에서 다음 Document 를 반환
cursor.noCursorTimeout()	cursor 자동 닫힘의 Timeout 을 비활성화
cursor.objsLeftInBatch()	현재 cursor 내 남아있는 document 수를 반환
cursor.pretty()	cursor 결과를 읽기 쉽게 표시
cursor.readConcern()	find() 명령어에 대한 readConcern 을 지정
cursor.readPref()	레플리카셋에서 클라이언트가 다이렉트로 읽을 수 있도록 cursor 설정
cursor.returnKey()	Document가 아닌 인덱스 key를 반환하도록 커서를 수정
cursor.showRecordId()	결과 Document 에 내부 엔진 ID 필드를 추가
cursor.size()	skip()와 limit() 를 적용한 커서내 결과 Document count 를 리턴
cursor.skip()	커서 내 Document에서 skip 또는 패스한 후의 결과를 리턴
cursor.sort()	정렬값에 의해 정렬된 결과를 리턴
cursor.tailable()	capped collection에서 커서에 tail 하여 제공 (마지막 내역만 계속 공유?)
cursor.toArray()	커서에 의해 반환된 Document를 배열로 반환

cursor.forEach()

예전 검색해서 저장해 놓은 쿼리

db.getCollectionNames().forEach(function(collection) {
  indexes = db[collection].getIndexes();
  print("Indexes for " + collection + ":");
  printjson(indexes);
});

cursor.itcount()

원문 : https://www.javaer101.com/ko/article/14712708.html

> db.SentMessages.find({Type : 'Foo'})
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }

> db.SentMessages.find({Type : 'Foo'}).count()
3

> db.SentMessages.find({Type : 'Foo'}).limit(1)
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }

> db.SentMessages.find({Type : 'Foo'}).limit(1).count();
3

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).count()
2014-08-12T14:47:12.488+0100 TypeError: Object #<Object> has no method 'count'

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).itcount()
3

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ]).itcount()
1

> exit
bye

batchSize 와 Limit 비교

참고(복붙이나 다름 없습니다.) : https://emflant.tistory.com/12
batchSize : 한 batch 당 가져오는 document 갯수.
limit : 쿼리의 결과로 가져올 총 갯수.
limit 와 batchSize 를 지정하지 않는 경우 batch는 한번 당 101개의 Document 결과를 리턴 하지만, Document 당 너무 많은 데이터가 있는 경우 batch 한번 당 1Mb 가 최대 size
limit 와 batchSize 를 지정하는 경우, 지정한 수만큼 리턴

큰 수로 셋팅하더라도 4Mb 이상의 데이터를 한번의 Batch로 가져올 수 없음.
인덱스 없이 데이터를 sort하는 경우 첫 번째 batch에 모든 데이터를 가져오지만, 최대 4 Mb 초과할 수 없음.

> // for문을 돌려서 간단한 데이터로 200개 document 를 등록한다.
> for (var i = 0; i < 200; i++) { db.foo.insert({i: i}); }
> var cursor = db.foo.find()
> // batchSize나 limit 값 지정없이 find 했으므로 기본 batch 크기인 101 documents
> cursor.objsLeftInBatch()
101

> // 한번에 모든 document들을 가져오기 위해 큰 limit 값을 셋팅하면 batchSize는 모든 document 수가 된다.
> var cursor = db.foo.find().limit(1000)
> cursor.objsLeftInBatch()
200

> // batchSize 가 limit 크기보다 작으면 batchSize가 우선한다.
> var cursor = db.foo.find().batchSize(10).limit(1000)
> cursor.objsLeftInBatch()
10

> // limit 가 batchSize 보다 작으면 limit 가 우선한다.
> var cursor = db.foo.find().batchSize(10).limit(5)
> cursor.objsLeftInBatch()
5

> // 각각 1MB 데이터로 10개의 document 를 등록한다.
> var megabyte = '';
> for (var i = 0; i < 1024 * 1024; i++) { megabyte += 'a'; }
> for (var i = 0; i < 10; i++) { db.bar.insert({s:megabyte}); }

> // limit나 batchSize를 지정하지 않았으므로 첫번째 batch는 1MB 에서 멈춘다.
> // 결국 1개씩 반복적으로 데이터를 가져오게됨
> var cursor = db.bar.find()

> cursor.objsLeftInBatch()
1

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB][Study-7] Find / FindAndModify / Cursor (0)	2021.03.28
[MongoDB] [Study-6] MongoDB CRUD 쓰기 연산 (0)	2021.03.28
[MongoDB][Study-4] MongoDB 기본 명령어 익히기 (0)	2021.03.27
[MongoDB][Study-3] Sharding (0)	2021.03.27
[MongoDB] [Study-2-2] P-S-A 구성 (0)	2021.03.27

[MongoDB][Study-4] MongoDB 기본 명령어 익히기

2021. 3. 27. 21:38

기본적인 명령어

runCommand

tool 에서 많이 보인 command
지정한 DB에서 도우미를 제공해 주는 명령어
document 또는 String type

Database 명령어로 문자열 또는 Document 형태의 명령어로 반환 또한 Document, string으로 반환

동작방식

db.runCommand()는 현재 선택된 DB 에서만 명령을 실행
일부 명령은 admin DB에서만 적용되며, 이러한 명령을 실행하기 위해서는 admin 으로 변경하거나 adminCommand()를 사용

명령 결과

Field	Description
ok	명령 성공 실패의 여부를 표시
operationTime	수행된 작업 시간 oplog 항목의 Timestamp로 MongoDB에 표시 Replica set 과 Shard cluster 에 해당 만약,명령어가 oplog 항목을 생성하지 않는 경우 ex) 읽기 작업의 경우, 해당 항목을 표시하지 않음 이때 운영 시간을 반환 Read concern을 "majority"와 "loinearizable" 인 경우, timestamp를 oplog의 가장 최근에 완료된 항목 값을 사용. Consistent session과 관련된 작업의 경우, MongoDB Driver는 Read 작업과 클러스터 시간을 자동으로 설정하여 사용 (v3.6)
$clusterTime	적용된 cluster 시간을 반환 cluster time은 작업 순서에 사용되는 논리적인 시간 Cluster와 replica set일 경우에 해당하며, Mongodb 내부 동작에서만 사용

ex)

참고 : https://docs.mongodb.com/manual/reference/method/db.runCommand/

https://docs.mongodb.com/manual/reference/command/

https://docs.mongodb.com/manual/reference/command/insert/#dbcmd.insert

Database

Database 생성 방법

USE 구문을 사용해서 생성 가능
USE 구문을 사용하면 DB선택도 가능
하지만, 단순 빈 Database 는 생성 하더라도 다시 확인 시 보이지 않음( mongo> show dbs )

하나의 Document 를 추가해 줘야 database가 생성 된 것을 확인 가능

Collection (Table) 생성 방법

생성하는 방법은 2가지가 존재 하지만, 특별한 상황이 아니라면 Document를 추가(insert)하여 collection이 생성 되도록 진행
createCollection을 이용한 명시적 생성은 Collection의 세부 정보를 수정할 수 있기에, 잘 알고 사용하지 않을 경우 예상 치 못한 장애를 발생 시킬 수 있기에, 미연에 방지하고자 하기 위함

4.2부터는 MMAPv1 Storage engine과 그에 대한 createCollection에서 사용 가능한 option도 삭제

Show collections 로 collection 리스트 확인 가능 ( mongo> show collections )
지향 : db.컬레션명.insert()

Ex) people이라는 Collection 추가 및 name, age 필드 추가
db.people.insert({"name":"Louis.Kim"},{"age":36})

지양 : db.createCollection("people") 또는 db.createCollection("people",{Option 및 Option 값})

CreatedCollection

capped Collection

db.createCollection(<컬렉션명>, {capped : true, size:4096})
일반 collection 과 다르게, 정해진 크기를 초과하게 되면 자동으로 가장 오래된 데이터를 삭제

반대로 유저가 임의로 데이터를 삭제는 불가능
삭제를 하고자 한다면, drop 만 가능

원형 버퍼와 유사한 방식으로 동작
Capped Collection의 대안으로 MongoDB의 TTL 인덱스 고려 가능

Collection 에 TTL를 설정하여 Expire 데이터를 제거 가능.
Capped Collection 는 TTL index 와 호환되지 않음

Sharding 지원하지 않음
Aggregation pipeline 단계의 $out 를 capped collection의 결과를 wirte 할 수 없음
size는 byte 단위
최소 size는 4096 byte (Collection이 기본으로 차지하는 size 때문)
Document의 삽입 속도가 매우 빠름

order를 하지 않은 collection을 find하는 경우, Insert 한 순서대로 결과를 가져오기 때문에 순서 보장하기에 매우 빠름
가장 최근에 삽입된 요소를 효율적으로 검색 가능

크기를 지정하고 사용하므로 추가 공간 필요 없음

> db.createCollection(name, {capped: <Boolean>, autoIndexId: <Boolean>, size: <number>, max <number>} )

파라미터	타입	설명
name	string	생성할 도큐먼트의 이름
options	document	컬렉션의 옵션 부여

option

필드	타입	설명
capped	boolean	capped이면 "true"를 아니면 "false"를 할당
autoIndexId	boolean	capped이면 "true"를, 그 외에는 "false"를 할당
size	number	저장 공간의 크기 지정
max	number	저장할 도큐먼트의 최대 개수 지정

혼자만의 고민

capped Collection

> 만약 100일치의 데이터를 보관이 정책인데, 들어가는 데이터 size가 크면 100일치 보다 더 일찍 삭제 될꺼고, size가 작으면 100일 치 보다 더 많이 남아 있을 껀데....???

> 차라리 storage 공간이 한정적이라서 중요하지는 않지만 보관이 필요한 경우 한정된 자원속에서만 저장이 필요한 경우, 관리 포인트를 없애기 위해 사용...

> 하지만 데이터가 지속적으로 삭제 추가 되는 상황이라면 I/O도 꾸준히 있을듯....여러모로 사용을 안하는 것이..

> 로그 데이터나, 일정 시간 동안만 보관하는 통계 데이터를 보관하고 싶을 때 유용할 것 같음.

view

미리 설정한 내용에 대해 읽을 수만 있는 뷰
실제로 데이터를 저장해서 불러오지 않기 때문에 사용할 수 있는 명령어에 제약(집계 파이프라인의 문법을 이용)

Document

[Delete]

Delete(권장)

3.2부터 Remove를 대체하는 메소드 추가
db.컬렉션명.deleteMany(조건)db.employee.deleteMany({"name":"Louis.Kim"})
Ex) employee 컬렉션에서 name이 Louis.Kim 을 삭제
db.컬렉션명.deleteOne(조건)

해당 조건에 맞는 한건만 삭제

Remove(지양)

db.컬렉션명.remove(조건)
deleteMany 와 remove 차이는 거의 없지만, 앞으로 remove는 지원하지 않는다고 예정하고 있기에 deleteMany로 삭제 진행 권장
추가로, 한 건만(유일값 이라고 단정 지을 수 있다면) 삭제 한다면 deleteOne이 deleteMany보다 성능이 미세하지만 더 낫기에 상황에 맞게 사용하면 됨(Single Transaction 처리냐, Multi Transaction 차이 여부)

[Update]

권장

Update 진행 시 대소문자 구분하여 검색 및 업데이트 진행
여러건의 Update가 아니라면, 무분별하게 Multi 옵션을 사용하지 않는 것을 추천.
explain을 확인하여 Index 사용 여부를 확인하며, _id를 이용한 검색을 활용
save 명령어는 지양 (insert / update 모두 해당)
Update이 후 존재하지 않으면 Insert하는 경우에만 upsert Option을 이용하며, 그게 아닌 곳에서는 Insert / Update를 명시해서 사용

db.collection.update(

  {찾을 조건},

  {$set:{변경할 필드 내용}}

  , {

    upsert: <boolean>, // 업데이트할 내용이 없다면 새로운 Document추가, Default는 False로 없으면 업데이트 안함

    multi: <boolean>, // 여러건 업데이트 여부, Default는 False로 한건만 업데이트

    writeConcern: <document>,

    Collation: <document>, // 3.4

    arrayFilters: [ <filterdocument1>, ... ], //3.6

    hint : <document | string>. // Mongodb 4.2

  } // Option 생략 가능

)

db.collection명.update({조건}, {$set:{변경하고자 필드내용}})

Monster Collection에서 Slime의 hp를 30으로 변경// WriteResult({ nMatched: 1, nUpserted: 0, nModified: 1 });
db.monsters.update({ name: 'Slime' }, { $set: { hp: 30 } });

만약 $set을 추가 하지 않고 진행 시 전부 지워지고 {변경하고자 필드 내용}만 남음

db.monsters.update({ name: 'Slime' }, { hp: 30 } );
이렇게 하면 결과는 hp:30만 남고 모든 내용 삭제 됨

추가 $inc 를 사용하여 기존 데이터를 손쉽게 제어 가능

Slime의 hp를 현재 얼마인지 몰라도 -30 해보자// WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
- db.monsters.update({name:'Slime'},{$inc:{hp:-30}})

Update를 해야하는 것이 한건이 아니라 여러건인 경우 multi Option을 추가해 줘야함

(Multi Option 을 추가 하지 않는 경우 한건만 변경)// WriteResult({ "nMatched" : 2, "nUpserted" : 0, "nModified" : 2 })
> db.monsters.update({name:'Slime'},{$inc:{hp:-80}},{multi:true})

[Update - Option]

Parameter	Type	설명
upsert	boolean	Optional. (기본값: false) 이 값이 true 로 설정되면 query한 document가 없을 경우, 새로운 document를 추가
multi	boolean	Optional. (기본값: false) 이 값이 true 로 설정되면, 여러개의 document 를 수정
writeConcern	document	Optional. wtimeout 등 document 업데이트 할 때 필요한 설정값 기본 writeConcern을 사용하려면 이 파라미터를 생략 Write Concern — MongoDB Manual
Collation	document	데이터 정렬을 지정 : 데이터 정렬을 사용하면 소문자 및 악센트 표시 규칙과 같이 문자열 비교를 위한 언어별 규칙을 지정할 수 있음. 한국어는 필요한지 의문(대,소문자 악센트 등의 규칙이 없음) 다국어 지원시 고려해 봐야 될 부분 collation: { locale: <string>, // 해당국가 언어 특성 적용( caseLevel: <boolean>, caseFirst: <string>, strength: <int>, // 1: 대소문자 구문 안함(default :0 대소문자구분) numericOrdering: <boolean>, alternate: <string>, maxVariable: <string>, backwards: <boolean> }
arrayFilters	Array	Array Filter에서 업데이트 작업을 위해 수정해야할 배열 요소를 결정하는 설정 $[<조건>]를 사용하여 조건 지정
Hint	Document or string	Index 를 강제로 지정 가능 Ex) status index를 강제로 사용 db.members.createIndex( { status: 1 } ) db.members.createIndex( { points: 1 } ) db.members.update( { points: { $lte: 20 }, status: "P" }, // Query parameter { $set: { misc1: "Need to activate" } }, // Update document { multi: true, hint: { status: 1 } } // Options )

[Update]

[Multi - Option Test]

[Collation - 대소문자 비교]

기본적으로 대소문자 구분

Collation의 strength 을 적용 후 진행

[arrayFilters]

Find

Find 관련 명령어에는 여러개가 있으며, 그 중에 많이 사용하는 위주로 가이드 하며, 필요 시 추가 가이드 진행 예정
Find명령어 사용 시 필요한 filed 명을 사용하여 검색을 추천
Ex) db.bios.find( {조건}, {_id:0, name:1 , money:1})

만약 empno만 빼고 다 보고자 하면 그때는 > db.thing.find( { }, {empno: 0} ) 이런식으로 표시
_id에 대해서만 혼용 사용 가능하며, 다른 필드들에 대해서는 1과 0을 혼용해서 표시 안됨

조건 검색 시 가급적 _id를 작성하며 Range 검색 시 기간을 지정하여 검색 추천

Ex) db.bios.find( {"_id":{"$gte":ObjectId("5dfaf5c00000000000000000", "$lte":ObjectId("5dfaf5c00000000000000000")}, {_id:0, name:1 , money:1})
(오늘 이전 모든 데이터 검색 or 오픈 이후 모든 날짜에 대해 검색 같은 전체 검색은 지양, 어제부터 일주일 전, 현재부터 하루 전 데이터 식으로 검색을 추천)
filed 명시 방법

> db.thing.find( { }, {empno: 1} )

// empno 를 표시하고, _id는 default로 표시, 단 그 외(ename) 은 표시 안함



> db.thing.find( { }, {empno: 0} )

// empno 만 표시를 안하고 나머지 ename , _id는 표시



> db.thing.find( { }, {_id:0, empno: 1} )

// empno만 표시하고, _id 및 다른 필드는 표시 안함.

// 여기서 중요한건 _id는 항상 명시해 줘야 하며, 보고 싶은 필드만 1로 설정해서 표시



// 다른 필드의 경우 ename이 있더라도 ename:0 으로 하면 에러 발생 >> 왜?????

> db.thing.find( { }, {empno: 1, ename:0} )

명령어	내역
find()	검색
findAndModify()	검색 후 수정 Update, upset, remove 모두 가능 new : true를 설정하여 update 이후 값을 리턴 new : false 또는 미적용 시 update 이전 값을 리턴 db.monsters.findAndModify({ query: { name: "Dragon" }, update: { $inc: { att: 1000 } ,$set :{"name":"Dragon","hp":4000,"att":1000}}, upsert: true, new : true })
findOne()	한건만 검색
findOneAndDelete()	한건만 검색 후 삭제
findOneAndReplace() > v3.2	한건만 검색 후 변경 returnNewDocument : true 설정하여 변경 전후 확인 가능 Replace 와 Update의 경우 Update는 명시한 필드만 변경 되지만, Replace의 경우는 명시한 필드 변경 외에는 나머지 필드는 모두 삭제 됨 가급적이면 Update만 사용 해야함
findOneAndUpdate() > v3.2	한건만 검색 후 변경 returnNewDocument : true 설정하여 변경 전후 확인 가능

[findAndModify]

db.monsters.findAndModify({ query: { name: 'Demon' }, update: { $set: { att: 350 } }, new: true })

[findAndModify]

db.monsters.findAndModify({

query: { name: "Dragon" },

update: { $inc: { att: 1000 } ,$set :{"name":"Dragon","hp":4000,"att":1000}},

upsert: true,

new : true

})

Cursor

쿼리 결과에 대한 포인터
find 명령어는 결과로 Document를 반환하지 않고 Cursor를 반환(Pointer)

성능을 높이기 위함(결과 값을 리턴하는 것이 아닌 값의 결과들을 모아 놓은 주소를 반환한다고 이해)

커서는 일시적으로 결과를 읽어내려고 존재하기 때문에 10분의 제한시간 이후에는 비활성 상태 전환
find 명령어를 실행하면 batch 라는 곳에 검색한 결과를 저장
일반적으로 101개의 document 를 batch에 모아 놓고 20개씩 커서가 가르킴
한 개의 document를 불러오기 위해서는 next()로 호출
커서로 batch 에 102번째 document를 불러오려고 하면 batch에 쿼리 결과를 102번째부터 시작해서 총 101개를 담고, 102번째 Document를 cursor 가 가르킴
커서를 이용한 Document 반환

find 명령어 모든 document를 모두 불러오기 위해서는 toArray() 메소드를 이용하면 모든 정보를 가져올 수 있음

> var cursor = db.cappedCollection.find()

> cursor

> db.cappedCollection.find()

결과...

> cursor.next()

하나만 리턴?

> cursor.hasNext()

true

모든 document 가져오는 방법

> db.cappedCollection.find().toArray()

toArray() 메소드 특징은 find문의 모든 결과를 담은 배열을 반환

모든 값이 다 필요하지 않다면 toArray 메소드는 비효율적
Document 총 크기가 매우 크다면 toArray 메소드를 사용할 시 사용 가능한 메모리 용량을 초과해 버릴 수 있음
forEach 메소드를 이용하여 커서로 각각의 Document를 순차적으로 불러와서 작업 가능 (메모리 효율적인 사용 가능)

https://docs.mongodb.com/manual/reference/method/js-cursor/

[그 외]

Terminate Running Operations

maxTimeMS()와 db.killOp() 으로 실행 중인 작업을 종료(kill) 가능
MongoDB 배치에서 작업의 동작을 제어하기 위해 필요에 따라 해당 작업을 사용

maxTime MS

작업 시간 제한을 설정
작업이 지정된 시간 제한인 maxTimeMS() 에 도달하면, MongoDB가 다음 interrupt 지점에서 작업을 중단
샘플

중단 쿼리 설정

mongo shell에서 , 쿼리의 시간을 30 ms 으로 설정

- location collection 에서 town이라는 필드에 대해 maxTimeMS를 30 ms 설정

db.location.find( { "town": { "$regex": "(Pine Lumber)", "$options": 'i' } } ).maxTimeMS(30)

중단될 명령어 실행

잠재적으로 오래 실행될 것이라는 명령어 실행
city key가 존재하는 각각의 개별 collection 필드를 반환하는 명령어 실행

db.runCommand( { distinct: "collection", key: "city" } )

maxTimeMS 를 45ms으로 필드에 대해 추가도 가능

db.runCommand( { distinct: "collection",key: "city",maxTimeMS: 45 } )

db.getLastError() 와 db.getLastErrorObj()으로 중단된 옵션에 대한 오류를 반환

KillOp

db.killOp() 는 작업 ID로 실행 중인 작업을 kill
명령어 : db.killOp(<opId>)
해당 명령어는 클라이언트를 종료할 뿐이지, DB 내부에서는 해당 명령어는 작업 종료 안함?

Sharded Cluster

MongoDB4.0에서 부터는, mongos 에서 KillOp 명령을 사용하여 Cluster 내 shard에 걸쳐서 실행되고 있는 쿼리를 kill 할 수 있음(read 작업).
Write 작업에 대해서는, mongos에서 killOp 명령어로 각 샤드에 존재하는 쿼리를 kill 할 수 없음.
Shard 에 대해서는 아래 참고

샤드 작업 리스트 확인 : Mongos에서 $currentOp의 localOps를 참고

[참고]

https://www.zerocho.com/category/MongoDB/post/579e2821c097d015000404dc

https://velopert.com/545

https://docs.mongodb.com/manual/reference/method/db.collection.update/

https://docs.mongodb.com/manual/reference/collation-locales-defaults/

https://velopert.com/479

https://cinema4dr12.tistory.com/373 (capped collection)

https://velopert.com/479 find 관련한 상세한 설명

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB] [Study-6] MongoDB CRUD 쓰기 연산 (0)	2021.03.28
[MongoDB] [Study-Break] Cursor 간략한 정리 (0)	2021.03.27
[MongoDB][Study-3] Sharding (0)	2021.03.27
[MongoDB] [Study-2-2] P-S-A 구성 (0)	2021.03.27
[MongoDB] [Study-2-1] Wired Tiger (WT엔진) (0)	2021.03.27

[MongoDB][Study-3] Sharding

2021. 3. 27. 21:26

Sharding의 정의

같은 테이블 스키마를 가진 데이터를 다수의 데이터베이스에 분산하여 저장하는 방법
Application Level 에서도 가능 (RDBMS 에서 Sharding) 하지만, Databas Level에서도 가능 (ex-MongoDB / Redis 등)
Horizontal Partitioning (수평파티션) 이라고도 함

Sharding 적용

프로그래밍, 운영적인 복잡도는 증가하고 높아지는 것을 뜻함
가능하면 Sharding을 피하거나 지연시킬 수 있는 방법

Sacle Up

하드스펙을 올리는 것

Read 부하가 크면

Replication 을 두어 Read 분산 처리

Table의 특정 컬럼만 사용 빈도수가 높다면

Vertically Partition(수직 파티션)을 진행
Data를 Hot, Warm, Cold data로 분리하여 처리

Memory DB를 활용
테이블의 데이터 건수를 줄이는 것

Sharding 방식

Range Sharding (range based sharding)

key (shard key) 값에 따라서 range 를 나눠서 데이터 를 분배하는 방식
비교적 간단하게 sharding이 가능
증설에 재정렬 비용이 들지 않음
Shard key 의 선택이 중요

Shard key에 따라 일부 데이터가 몰릴 수 있음(hotspots)
트래픽이 저조한 DB가 존재

Hash Sharding (hash based sharding)

key를 받아 해당 데이터를 hash 함수 결과로 분배
Hotspot를 방지하고 균등하게 분배
재분배를 해야하는 경우(삭제 또는 추가) 전체 데이터를 다시 hash value를 이용하여 분배 (Migration 에 어려움)

Directory Based Sharding

shard내 어떤 데이터가 존재하는 지 추적할 수 있는 lookup table이 존재
range based sharding 과 비슷하지만, 특별한 기준에 의해 shard를 나눈 것이라, 동적으로 shard를 구성 가능

range나 hash 모두 정적인 반면, 해당 sharding은 유연성 있게 임의로 나누는 것이라 유연성을 갖춤

쿼리 할 때 모두 lookup table 를 참조하기 때문에 lookup table이 문제를 일으킬 소지를 보유

lookup table이 hot spot 가능성
lookup table이 손상되면 문제 발생

MongoDB Sharding

분산 처리를 통한 효율성 향상이 가장 큰 목적이므로 3대 이상의 샤드 서버로 구축을 권장(최소 2대)
싱글 노드 운영보다 최소 20~30% 추가 메모리 요구 (MongoS와 OpLog, Balancer 프로세스가 사용하게 될 추가 메모리 고려)
샤드 시스템에 구축되는 config 서버는 최소 3대 이상 활성화를 권장.

Config 서버는 샤드 시스템 구축과 관련된 메타 데이터를 저장 관리하며 빠른 검색을 위한 인덱스 정보를 저장, 관리하고 있기 때문
샤드 서버와는 별도의 서버에 구축이 원칙
Config 서버는 샤드 서버보다 저사양의 시스템으로 구축 가능

Config Server

Config 서버는 샤딩 시스템의 필수 구조
최소 1대가 요구되며 장애로 인해 서비스가 중지되는 것을 피하기 위해 추가로 Config 서버 설정이 필요.(HA 구성 필요-PSS-필수)
Config 서버는 각 샤드 서버에 데이터들이 어떻게 분산 저장되어 있는지에 대한 Meta Data가 저장 (Shard 정보를 저장 관리)

Shard Meta 정보

MongoS가 처리하는 Chunk 단위로 된 chunk 리스트와 chunk들을의 range 정보를 보유

분산 Lock

MongoS들 간의 config 서버와의 데이터 통신 동기화를 위해 도입
샤딩을 수행할 연산들에 대해 분산 락을 사용

여러개의 mongos가 동시에 동일한 chunk에 대한 작업을 시도하는 등의 이슈를 방지하기 위함
작업을 수행하기 전 config server의 locks collection 의 lock을 획득 후에만 작업 가능

repl_conf:PRIMARY> db.locks.find() 
{ "_id" : "config", "state" : 0, "process" : "ConfigServer", "ts" : ObjectId("5d6b7f15a9f5ecd49052a36f"), "when" : ISODate("2019-09-01T08:19:33.165Z"), "who" : "ConfigServer:conn164", "why" : "createCollection" } 
{ "_id" : "config.system.sessions", "state" : 0, "process" : "ConfigServer", "ts" : ObjectId("5d6b7f15a9f5ecd49052a376"), "when" : ISODate("2019-09-01T08:19:33.172Z"), "who" : "ConfigServer:conn164", "why" : "createCollection" } 
{ "_id" : "testdb", "state" : 0, "process" : "ConfigServer", "ts" : ObjectId("5d62889d3ed72a6b6729a5ca"), "when" : ISODate("2019-08-25T13:09:49.728Z"), "who" : "ConfigServer:conn24", "why" : "shardCollection" } 
{ "_id" : "testdb.testCollection2", "state" : 0, "process" : "ConfigServer", "ts" : ObjectId("5d62889d3ed72a6b6729a5d1"), "when" : ISODate("2019-08-25T13:09:49.738Z"), "who" : "ConfigServer:conn24", "why" : "shardCollection" } 
{ "_id" : "test.testCollection2", "state" : 0, "process" : "ConfigServer", "ts" : ObjectId("5d6288ab3ed72a6b6729a65a"), "when" : ISODate("2019-08-25T13:10:03.834Z"), "who" : "ConfigServer:conn24", "why" : "shardCollection" }

Lock 역할

밸런서(balancer)의 연산
Collection 의 분할(split)
Collection 이관(migration)

LockPinger : 해당 쓰레드는 30초 주기로 config 서버와 통신

밸런서와 연관이 있으며, 읽기,쓰기 발생 시 config 서버에 해당 lock 을 취득하는 과정을 관장하는 역할
자세한 내용은 http://mongodb.citsoft.net/?page_id=256 을 참조 (lockpinger 관련한 에러 버그들이 존재했지만, 현재는 더이상 정보를 찾기가 힘드네요)
lockpinger bug : https://jira.mongodb.org/browse/SERVER-17812

복제 집합 정보 : MongoS가 관리, 접속해야 하는 Mongo Shard 정보

MongoS가 데이터를 쓰고/읽기 작업을 수행할 때 Config 서버는 MongoS를 통해 데이터를 동기화-수집 진행

MongoS

데이터를 Shard 서버로 분배해 주는 프로세스 (Router-Balancer)

Data를 분산하다 보면 작업의 일관성을 위하여 Lock을 사용
이때 Chunk Size를 적절하게 설계하지 못하면 Migration 때문에 성능 저하 현상이 발생 가능성
DB 사용량이 적은 시간대 Balancer를 동작시키고 그 외 시간에는 끄는 방법도 성능 향상의 방법

하나 이상의 프로세스가 활성화 가능(여러대의 MongoS를 운영 가능)
Application Server에서 실행 가능 (Application에서 직접적으로 접속하는 주체이며,독립적인 서버로 동작 가능하며, Application 서버 내에서도 API 형태로 실행 가능)

MongoS를 Application Server 서버 local 에 설치하는 것을 추천
(application server 가 별도의 라우터를 네트워크 공유 안하고, Local에서 직접 접근하기 때문에 효율성 증가. 별도의 서버를 구축 하지 않아서 서버 비용 절감. 단, 관리 포인트로 인한 문제점도 존재)

Config 서버로부터 Meta-Data를 Caching

read, write 작업시 해당 샤드를 찾을 수 있도록 캐쉬할 수 있는 기능을 제공

root@7d536b10b886:/# mongos --configdb config-replica-set/mongo1:27019,mongo2:27019 --bind_ip 0.0.0.0

MongoS가 실행될 때 Config 서버를 등록

Config 서버와 연결되면 샤딩 정책을 포함한 메타 정보를 연결된 모든 Config 서버에 전송

MongoS는 Config Server 와 연결하게 되는데, Config Server가 여러 대인 경우 여러 대 중 하나라도 연결이 안되면 MongoS 는 연결 실패로 MongoS가 실행되지 않음
MongoS 내에서는 데이터를 저장하지 않으며, 다른 MongoS간 연결이 없기 때문에 데이터 동기화를 위해 Config 서버를 이용
MongoS 의 쿼리 클러스터 라우팅 방법

쿼리를 보내야 하는 샤드 리스트를 결정
대상되는 샤드에 커서를 설정
Target Shard에 보낸 결과를 받아 데이터를 병합하고 해당 결과를 Client로 Return
Mongo3.6에서는, 집계 쿼리의 경우 각 Shard에서 작업하는 것이 아닌, Mongos에서 결과를 받아 merge 후 작업하여 리턴하는 형태로 변경
MongoS에서 Pipline을 운영할 수 없는 2가지 경우

분할 파이프라인의 병합 부분에 Primary shard 에서 동작해야하는 부분이 포함되어 있는 경우

$lookup 집계가 실행중인 Shard Collection 과 동일한 Database 내에 있는 unshared collection과 진행 된다면, 병합은 Primary shard에서 실행해야 함

분할 파이프라인의 $group과 같은 Temporary data를 Disk에 기록해야 하는 경우가 포함된 경우, 또한 Client는 allowDiskUse를 True 지정했을 경우 Mongos를 사용할 수 없음
이런 경우, Merged 파이프라인은 Primary shard에서 하지 않고, 병합 대상인 샤드들 중 무작위로 선택된 샤드에서 실행

Shard cluster 에서 Aggregation 쿼리가 어떻게 동작하고 싶은지 알고 싶으면, explain:true 로 설정하여 aggregation을 실행하여 확인 가능
mergeType은 병합 단계에서 ("primaryShard", "anyShard", or "mongos") 로 보여주며, 분할 파이프라인은 개별샤드에서 실행된 작업을 리턴.
https://docs.mongodb.com/manual/core/sharded-cluster-query-router/ 참고

Chunk

collection을 여러 조각으로 파티션하고 각 조각을 여러 샤드 서버에 분산해서 저장하는데, 이 데이터 조각을 Chunk라고 함
chunk는 각 샤드서버에 균등하게 저장되어야 좋은 성능을 낼 수 있음
균등하게 저장하기 위해 큰 청크를 작은 청크로 Split 하고 청크가 많은 샤드에서는 적은 샤드로 Chunk Migration 을 수행
청크 사이즈는 Default 64Mb 이며 size를 변경도 가능 (또는 100,000 row)

Chunk size 변경 시 유의사항

Chunk size를 작게하면 빈번한 마이그레이션이 동작하여 성능은 저하가 발생할 수 있으나, 데이터를 고르게 분배할 수 있는 효과를 볼 수 있다. (추가로 mongos에서 추가 비용이 발생)
청크 사이즈를 크게하면 마이그레이션 빈도는 줄어들어 네트워크나 mongos 에서 오버헤드 측면에서 효율적.그러나 잠재적으로 분포의 불균형이 발생 가능성
청크 사이즈는 마이그레이션할 청크 내 document 수와 비례(청크가 클수록 저장되는 document 수가 증가)
청크 사이즈는 기존 Collection을 분할할 때 최대 컬렉션 크기에 따라 영향. 샤딩 이후 청크 사이즈는 컬렉션 크기에 영향 없음

Chunk Split

샤드 내 Chunk의 사이즈가 너무 커지는 것을 막기 위해 split 이 발생
청크가 지정된 청크 사이즈를 초과하거나, 청크내의 문서 수가 마이그레이션할 청크당 최대 문서 수를 초과할 경우 발생
이 때 split는 샤드 키 값을 기준으로 진행
Insert 또는 update 시 split 이 발생
split 이 발생하면 메타 데이터 변경이 발생하며, 데이터의 균등함을 가져온다
Split 을 한다고 해도 청크가 샤드에 고르게 분포되지 않을 수 있음

이때 밸런서가 여러 샤드에 존재하는 청크를 재분배
클러스터 밸런서 참고

Chunk Migration

여러 Shard 로 나누어진 청크를 샤드간 균등하게 분배하기 위하여 Migration 을 진행

Balance 프로세스가 moveChunk 명령을 Source 샤드로 명령(Chunk Migration이 필요한 Shard=Source Shard)
Source Shard는 moveChunk 명령으로 이동 시작

이동하는 Chunk는 라우팅 되어 동작하며, 경로에 대한 내역은 Source Shard에 저장(수신에 대한 내역)

Target Shard는 해당 Chunk 관련한 Index를 생성(build)
Target Shard는 Chunk 내의 Document 요청을 하고, 해당 데이터 사본을 수신 시작
Chunk에서 최종(원본) Docuemnt를 수신한 후 Migration 간 변경된 내역이 있는지 확인하기 위하여 다시 한번 Target Shard는 동기화 프로세스를 시작
완전히 동기화가 완료되면, Config Server에 연결하여 Cluster Meta 데이터를 청크의 새 위치로 업데이트 진행(mongoS가 관여 하지 않을까....의견)
Source Shard가 Meta 정보를 업데이트를 완료하고 Chunk에 열린 커서가 없으면 Source Shard는 Migration 대상을 삭제 진행

샤드에서 여러 청크를 마이그레이션 하기 위해서는 밸런서는 한 번에 하나씩 청크를 마이그레이션 진행

3.4에서부터는 병렬 청크 마이그레이션을 수행 가능. 단, 샤드가 한 번에 최대 하나만 참여하지만, 샤드가 n 개인 여러개의 샤딩된 클러스터의 경우 최대 샤드 n개/2 만큼 동시 Migration을 진행 가능

클러스터 내 Chunk Migration은 한번만 가능했지만, 3.4부터는 Cluster 내 최대 n/2 개의 Chunk Migration은 가능(단 하나의 샤드당 하나의 Chunk Migration만 가능)

하지만, 밸런서는 청크를 이동 후 이동한 청크를 삭제에 대해서(삭제단계)는 기다리지 않고 다음 마이그레이션 진행 (비동기식 청크 마이그레이션 삭제)

이로 인해 삭제 단계가 오래 소요 되는 경우도 발생
MongoDB 4.4 에서는 삭제 단계에서 장애 조치가 발생하는 경우

기존에는 삭제 단계에서 장애 발생 시 삭제되어야 하는 청크가 삭제가 안되었지만, 4.4 버전 에서는 장애 조치 이후나 재시작 후에도 정리가 진행
https://docs.mongodb.com/manual/core/sharding-balancer-administration/#sharding-migration-thresholds 참고

attemptToBalanceJumboChunks 라는 밸런서 설정을 하면, 마이그레이션 하기 너무 큰 청크는 밸런서가 이동 시키지 않음
Migration 조건

수동

대량 Insert 중 데이터를 배포해야 하는 경우 제한적으로 사용
수동 마이그레이션 참조

자동

밸런서 프로세스가 Collection 내 Chunk들이 파편화가 발생하여 고르게 분포 되지 않았다고 판단 될 때 Chunk를 이동 시킴
측정 임계값

하나의 샤드 서번에 8개의 Chunk가 발생하면 다른 서버로 Migration이 발생하는데 20개 미만의 Chunk가 발생하면 평균 2번 정도의 Migration 이 발생
Migration이 빈번하게 발생하면 Chunk를 이동하기 위한 작업이 수시로 발생하기 때문에 성능 지연현상을 발생 시킬 수 있음
적절한 빈도의 Migration이 발생되기 위해서는 적절한 Chunk Size를 할당이 필요

Chunk 수	Chunk Migration 수
1 ~20개 미만	2
21 ~ 80 개 미만	4
80개 이상	8

Sharded Cluster Balancer

https://docs.mongodb.com/manual/core/sharding-balancer-administration/#sharding-internals-balancing

각 Shard의 Chunk 수를 모니터링하는 백그라운드 프로세스
샤드의 청크 수가 Migration 임계치 값에 도달 하면 밸런서가 샤드간에 Chunk를 자동으로 Migration하고 샤드 당 동일한 수의 청크를 유지

샤드된 컬렉션의 청크를 모든 샤드 컬렉션에 균등하게 재분배하는 역할(밸러서는 Default로 enable)
샤드간 청크가 균등하게 유지될 때까지 밸런서가 동작 (Migration 는 위의 Chunk Migration 동작을 참고)

밸런싱 절차는 사용자 및 어플리케이션 계층에서 투명하게 동작하지만, Migration이 진행되는 동안에는 부하가 발생
밸런서는 Config server Replica set에서 Primary에서 동작
유지보수를 위해 밸런서를 비활성화도 가능하며 수행 시간을 설정도 가능
balancing 작업이 02:00 에 자동으로 수행

#Config 서버에서 동작을 확인 가능

repl_conf:PRIMARY> db.settings.update (

... {_id:"balancer"},

... {$set : {activeWindow: {start: "02:00", stop : "06:00" } } },

... {upsert : true } )

WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "balanver" })

repl_conf:PRIMARY> db.settings.find()

{ "_id" : "chunksize", "value" : 64 }

{ "_id" : "balancer", "stopped" : false }

{ "_id" : "autosplit", "enabled" : true }

{ "_id" : "balanver", "activeWindow" : { "start" : "02:00", "stop" : "06:00" } }

샤드 추가 및 제거

샤드를 추가하게 되면 새로운 샤드에는 Chunk가 없기 때문에 불균형이 발생
클러스터에서 샤드를 제거하면 상주하는 청크가 클러스터 전체로 재분배가 일어나므로 샤드 추가와 같이 불균형이 발생
샤드를 추가하거나 삭제 모두 마이그레이션 하기 시작하면 시간이 소요
효율적인 방안은 좀 더 조사가 필요

Sharding System 주의점

하나의 Shard 서버에 데이터가 집중되고 균등 분산이 안 되는 경우

Shard Key로 설정된 필드의 Cardinality가 낮은 경우에 Chunk Size는 반드시 64Mb 단위로 분할되는 것이 아님

때로는 64Mb보다 훨씬 큰 크기의 Chunk 크기로 생성되기도 함
Default로 8개 정도의 Chunk가 발생하면 Migration이 발생하기 때문에 다른 서버로 Chunk를 이동하는 횟수는 줄어들게 되고 자연스럽게 하나의 샤드 서버에 만 데이터 집중되는 현상 발생

적절한 Shard Key를 선택하지 못한 경우 발생하는 문제점(Shard Key의 중요성)
혹여나 균등 분산이 안된다고 판단되면 Chunk Size를 줄이는 것을 추천(Migration 빈도수가 높아 져 균등하게 분활은 되나 성능 저하 발생)

특정 Shard 서버에 IO 트래픽이 증가하는 경우

MongoDB의 샤드 서버는 동일한 Shard Key를 가진 데이터들을 같은 샤드 서버에 저장하기 위해 Split 과 Migration 수행
Shard 서버의 IO 트래픽이 증가하는 이유는 너무 낮은 Cardinality 를 가진 Field를 설정 때문
데이터가 집중적으로 저장되어 있는 Chunk를 Hot Chunk라고 하며, 특정 서버에 집중되어 있을 때 상대적으로 서버 IO 트래픽이 증가

샤드 클러스터의 밸런스가 균등하지 않는 경우

데이터를 입력할 때 로드 밸런싱이 적절하게 수행되었지만 사용자의 필요에 따라 특정 서버의 데이터를 삭제 또는 다른 저장 공간으로 이동했다면 Balancer가 깨지게 됨
Shard key 의 낮은 Cadinality
하나의 Collection에 대량의 Insert 되는 것 보다 분산 저장되는 속도가 늦는 경우 밸런스 불균형 발생
Insert 되는 속도에 비해 Chunk Migration이 빈약한 네트워크 대역폭으로 인해 빠르게 이동되지 못하는 경우 발생
하루 일과 중 Peak 시간에 빈번한 Migration이 발생하게 되면 시스템 자원의 효율성이 떨어지게 되어 성능 지연 발생

피크 시간에 Chunk Migration을 중지하고 유휴 시간에 작업 될 수 있도록 Balance 설정

과도한 Chunk Migration이 클러스트 동작을 멈추는 경우

빈번한 Chunk Migration이 일시적으로 Cluster 서버 전체의 성능 지연 문제를 유발
빈번한 Migration 회수를 줄일 수 없다면 유휴 시간에 작업 될 수 있도록 Balance 설정
불필요하게 큰 Chunk Size는 네트워크 트래픽을 증가시키고 시스템 자원의 효율성을 저하 시키는 원인이 될 수 있으므로 Chunk size를 줄이는 것 고려
Shard 서버의 밸런싱이 적절하지 않다는 것은 Shard 서버의 수가 처리하려는 데이터 발생 양에 비해 부족하기 때문이므로 샤드 서버 대수 증설 고려

쓰기 성능이 지연되고 빠른 검색이 안 되는 경우

초당 몇 만 건 이상의 데이터들이 동시에 저장되기 위해 Collection의 크기가 중요
하나의 Collection은 여러개의 익스텐트 구조로 생성되는데, 익스텐트 사이즈가 작게 생성되어 있으면 잦은 익스텐트 할당으로 인해 불필요한 대기 시간이 발생
충분한 익스텐트 크기로 생성(createCollection 을 이용하여 Collection을 명시적으로 생성하면서 size를 조정 가능)
빠른 쓰기 성능이 요구되는 경우 Rich Docuemnt 구조로 설계하는 것이 유리 (Data Model Design 참고 / Embedded Document도 고려)
메모리 부족으로 인한 성능 저하로 메모리 증설도 고려

Shard Key 의 중요성,(1~5) Balancer는 유휴 시간대에 작업 추천(3), 적절한 Shard 수(4), 메모리도 체크(5)

참고

저작자표시 비영리

'MongoDB > MongoDB-Study_완료' 카테고리의 다른 글

[MongoDB] [Study-Break] Cursor 간략한 정리 (0)	2021.03.27
[MongoDB][Study-4] MongoDB 기본 명령어 익히기 (0)	2021.03.27
[MongoDB] [Study-2-2] P-S-A 구성 (0)	2021.03.27
[MongoDB] [Study-2-1] Wired Tiger (WT엔진) (0)	2021.03.27
[MongoDB] [Study-1] MongoDB 이란? (0)	2021.03.27

PREV 1 2 NEXT