My app is becoming a bit of a nightmare and this is a non-standard MongoDB problem I've not been able to find elsewhere.
我的应用程序变得有点噩梦,这是我在其他地方找不到的非标准MongoDB问题。
My server flow is like this:
我的服务器流程是这样的:
- A user uploads a list of objects containing
{names, emails and company domains}
to my server - 用户将包含{姓名,电子邮件和公司域}的对象列表上载到我的服务器
- My server turns all of these into
Person
objects. - 我的服务器将所有这些转换为Person对象。
- Once the
Person
has been saved, I search MongoDB to see whether a record for the person'sDomain
exists. - 一旦Person被保存,我搜索MongoDB以查看该人的域的记录是否存在。
- If it exists, I add the
Person
's Mongo_id
to a list of users from thatDomain
. - 如果它存在,我将Person的Mongo _id添加到该域的用户列表中。
- If it doesn't exist, I create a new
Domain
document and save it. - 如果它不存在,我创建一个新的域文档并保存它。
This works in theory, BUT, due to Async, sometimes I'm sending thousands of Person
objects to the Domain saver function at once. Which means (at least, what I think is happening):
这在理论上是有效的,但是由于Async,有时候我会立即向Domain保护程序发送数千个Person对象。这意味着(至少,我认为发生了什么):
- Mongo searchers for "domain 1", sees there is no document so creates one, then saves one.
- Mongo搜索“域1”,看到没有文件所以创建一个,然后保存一个。
- While this is still happening,Mongo searches for "Domain 1" from a separate user. No document has been saved yet so it finds none and makes a new one.
- 虽然这种情况仍在发生,但Mongo会从一个单独的用户中搜索“Domain 1”。尚未保存任何文档,因此它找不到任何文档并创建一个新文档。
- Now I have two documents with the same Domain identifier.
- 现在我有两个具有相同域标识符的文档。
Here's the code I'm currently using:
这是我目前使用的代码:
Domain.findOne({
domain: domn
}, function(err, rec) {
if (err) {
console.log("Domain finding error: " + err)
bigCount.doneDoms++;
checkCount()
} else if (rec){
var tempObj = {}
tempObj['$addToSet'] = { users: id }
tempObj['$addToSet'].emails = user.email;
if (userDoms.indexOf(rec._id) === -1) {
userDoms.push(rec._id)
}
Domain.update({domain: domn}, tempObj, function(err) {
if (err) {
console.log("Old rec save Error: " + err)
bigCount.doneDoms++;
checkCount();
}else{
// Saved Document
}
});
} else {
var newDom = new Domain();
newDom.domain = user.domain;
newDom.company = user.company;
newDom.users = [];
newDom.users.push(id);
newDom.emails = [];
newDom.emails.push(user.email);
newDom.save(function(err, record) {
if (err) {
console.log("Dom save error: " + err)
} else {
// Saved Document
}
});
}
})
Perhaps a stripped-down version of the question is, how do I handle somethign like this:
也许问题的精简版本是,我如何处理这样的事情:
var arr = [{dom: 'dom1.com', user: 'James'}, {dom: 'dom1.com', user: "Phil"}, {dom: 'dom1.com', user: "Jess"} ...x1000... {dom: 'dom1.com', user: "Chris"];
for(var i - 0; i< arr.length; i++){
var dom = arr[i]; var user = arr[i].user;
Domain.findOne({domain: dom}, function(err, rec){
if(rec){
// Update old rec
if(rec.users.indexOf(user) === -1){
rec.users.push(user);
}
rec.save();
]else{
// Make a new rec
var rec = New Domain();
rec.users = [user]
rec.save();
}
})
}
Due to speed/Async, a lot of records are going to be created here, when really I only one want
由于速度/异步,这里会创建很多记录,而我真的只想要一个
2 个解决方案
#1
2
I would personally go on the approach that you are going about things the wrong way around, as well as that you could use some flow control in here.
我个人会继续采用这种方法来处理错误的方法,以及你可以在这里使用一些流量控制。
Regardless of the "list" source, the general flow that should be happening is:
无论“列表”来源如何,应该发生的一般流程是:
-
Instantiate object for the user ( you get an
_id
in return after all )为用户实例化对象(毕竟你得到一个_id)
-
Look for the domain data if it exists, and if not then create it while adding the user at the same time. ( very possible )
查找域数据(如果存在),如果不存在,则在同时添加用户时创建域数据。 (非常可能)
-
Finally add the matched domain to the user and save them
最后将匹配的域添加到用户并保存
This all follows a pattern easily achieved with .findOneAndUpdate()
along with the "upsert" option, which will create a new document if not found, and at any rate return the resulting document, either found or created.
这一切都遵循使用.findOneAndUpdate()以及“upsert”选项轻松实现的模式,如果找不到该选项将创建新文档,并且无论如何都会返回找到或创建的结果文档。
So with some node async library helpers, here it is:
所以对于一些节点异步库助手,这里是:
var async = require('async'),
mongoose = require('mongoose'),
Schema = mongoose.Schema;
var userSchema = new Schema({
name: { type: String, required: true },
domain: { type: Schema.Types.ObjectId, ref: 'Domain' }
});
var domainSchema = new Schema({
name: { type: String, required: true },
users: [{ type: Schema.Types.ObjectId, ref: 'User' }]
});
var User = mongoose.model('User',userSchema),
Domain = mongoose.model('Domain',domainSchema);
mongoose.connect('mongodb://localhost/domains');
var arr = [
{dom: 'dom1.com', user: "James"},
{dom: 'dom2.com', user: "Phil"},
{dom: 'dom1.com', user: "Jess"},
{dom: 'dom1.com', user: "Chris"},
{dom: 'dom3.com', user: "Jesse"}
];
async.series(
[
// Clean removal of data for demo
function(callback) {
async.each([User,Domain],function(model,callback) {
model.remove({},callback);
},callback);
},
// The actual insertion process
function(callback) {
async.eachLimit(arr,10,function(item,callback) {
var user = new User({ name: item.user });
// user already has the _id
Domain.findOneAndUpdate(
{ "name": item.dom },
{ "$push": { "users": user._id } },
{ "new": true, "upsert": true },
function(err,domain) {
if (err) callback(err);
user.domain = domain._id; // always returns something
// now save the user
user.save(callback);
}
);
},callback);
},
// List back populated as the proof
function(callback) {
User.find({}).populate('domain').exec(function(err,users) {
if (err) callback(err);
//console.log(users);
//callback();
var options = {
path: 'domain.users',
model: 'User'
};
User.populate(users,options,function(err,results) {
if (err) callback(err);
console.log( JSON.stringify( results, undefined, 2 ) );
callback();
});
});
}
],
function(err) {
if (err) throw err;
mongoose.disconnect();
}
);
And that will produce output like:
这将产生如下输出:
[
{
"_id": "55e6aa0e85e8b9102179f5c2",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff5",
"name": "dom1.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c2",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "James",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c4",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Jess",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c5",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Chris",
"__v": 0
}
]
},
"name": "James",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c3",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff6",
"name": "dom2.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c3",
"domain": "55e6aa0ecb536c5a93574ff6",
"name": "Phil",
"__v": 0
}
]
},
"name": "Phil",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c4",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff5",
"name": "dom1.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c2",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "James",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c4",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Jess",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c5",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Chris",
"__v": 0
}
]
},
"name": "Jess",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c5",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff5",
"name": "dom1.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c2",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "James",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c4",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Jess",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c5",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Chris",
"__v": 0
}
]
},
"name": "Chris",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c6",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff7",
"name": "dom3.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c6",
"domain": "55e6aa0ecb536c5a93574ff7",
"name": "Jesse",
"__v": 0
}
]
},
"name": "Jesse",
"__v": 0
}
]
So all domains either got created or were re-used when then existed, and we added the user to the list in there at the same time using $push
since we already had the _id
for the user once the instance was created.
因此,所有域都已创建或在存在时重新使用,并且我们使用$ push同时将用户添加到列表中,因为在创建实例后我们已经为用户提供了_id。
With the domain document returned, being either new or something found, you simply set the domain on the user and save it.
返回域文档,无论是新文档还是找到的文档,您只需在用户上设置域并保存即可。
The async.eachLimit
is also a "special" variant that "limits" the number of concurrent processes to run under the loop. This is wise practice in real world scenarios, since you don't want to have every update happening at the same time.
async.eachLimit也是一个“特殊”变体,它“限制”在循环下运行的并发进程数。这是现实场景中的明智做法,因为您不希望同时发生每个更新。
Also, regardless of the process, the "Domain" cannot possibly get created more than once. The atomic operations of MongoDB will prevent this, and you only get an existing return or the new document depending on what was there at the time of the request.
此外,无论进程如何,“域”都不可能多次创建。 MongoDB的原子操作将阻止这种情况,您只能获得现有的返回或新文档,具体取决于请求时的内容。
As you can see in the output, everything can populate nicely so the "User" and "Domain" details are visible at all levels.
正如您在输出中看到的,所有内容都可以很好地填充,因此“用户”和“域”详细信息在所有级别都可见。
Moral of the story is "Don't tie yourself in knots persisting one thing and altering again and again and again". Just do it once and be done with it. It's certainly faster.
故事的道德是“不要把自己束缚在坚持一件事并且一次又一次地改变”。只需要做一次并完成它。它当然更快。
#2
1
§addToSet together with async does exactly what you're looking for:
§addToSet与async一起完成您正在寻找的内容:
var items = [
{dom: 'dom1.com', user: "Johnny"},
{dom: 'dom1.com', user: "Doggie"},
{dom: 'dom1.com', user: "Lisa"},
{dom: 'dom2.com', user: "Mark"},
{dom: 'dom3.com', user: "Denny"}
];
async.each(items, function(item, callback){
Domain.findOneAndUpdate(
{domain: item.dom},
{$addToSet: {users: item.user}},
{ "new": true, "upsert": true },
callback
);
},
function (err){
// done / handle errors
}
);
Output:
输出:
[{
"_id": ObjectID("55e6c9bf63006d730254ea8b"),
"domain": "dom1.com",
"users": [
"Johnny",
"Doggie",
"Lisa"
]
},
{
"_id": ObjectID("55e6c9bf63006d730254ea8c"),
"domain": "dom2.com",
"users": [
"Mark"
]
},
{
"_id": ObjectID("55e6c9bf63006d730254ea8d"),
"domain": "dom3.com",
"users": [
"Denny"
]
}]
#1
2
I would personally go on the approach that you are going about things the wrong way around, as well as that you could use some flow control in here.
我个人会继续采用这种方法来处理错误的方法,以及你可以在这里使用一些流量控制。
Regardless of the "list" source, the general flow that should be happening is:
无论“列表”来源如何,应该发生的一般流程是:
-
Instantiate object for the user ( you get an
_id
in return after all )为用户实例化对象(毕竟你得到一个_id)
-
Look for the domain data if it exists, and if not then create it while adding the user at the same time. ( very possible )
查找域数据(如果存在),如果不存在,则在同时添加用户时创建域数据。 (非常可能)
-
Finally add the matched domain to the user and save them
最后将匹配的域添加到用户并保存
This all follows a pattern easily achieved with .findOneAndUpdate()
along with the "upsert" option, which will create a new document if not found, and at any rate return the resulting document, either found or created.
这一切都遵循使用.findOneAndUpdate()以及“upsert”选项轻松实现的模式,如果找不到该选项将创建新文档,并且无论如何都会返回找到或创建的结果文档。
So with some node async library helpers, here it is:
所以对于一些节点异步库助手,这里是:
var async = require('async'),
mongoose = require('mongoose'),
Schema = mongoose.Schema;
var userSchema = new Schema({
name: { type: String, required: true },
domain: { type: Schema.Types.ObjectId, ref: 'Domain' }
});
var domainSchema = new Schema({
name: { type: String, required: true },
users: [{ type: Schema.Types.ObjectId, ref: 'User' }]
});
var User = mongoose.model('User',userSchema),
Domain = mongoose.model('Domain',domainSchema);
mongoose.connect('mongodb://localhost/domains');
var arr = [
{dom: 'dom1.com', user: "James"},
{dom: 'dom2.com', user: "Phil"},
{dom: 'dom1.com', user: "Jess"},
{dom: 'dom1.com', user: "Chris"},
{dom: 'dom3.com', user: "Jesse"}
];
async.series(
[
// Clean removal of data for demo
function(callback) {
async.each([User,Domain],function(model,callback) {
model.remove({},callback);
},callback);
},
// The actual insertion process
function(callback) {
async.eachLimit(arr,10,function(item,callback) {
var user = new User({ name: item.user });
// user already has the _id
Domain.findOneAndUpdate(
{ "name": item.dom },
{ "$push": { "users": user._id } },
{ "new": true, "upsert": true },
function(err,domain) {
if (err) callback(err);
user.domain = domain._id; // always returns something
// now save the user
user.save(callback);
}
);
},callback);
},
// List back populated as the proof
function(callback) {
User.find({}).populate('domain').exec(function(err,users) {
if (err) callback(err);
//console.log(users);
//callback();
var options = {
path: 'domain.users',
model: 'User'
};
User.populate(users,options,function(err,results) {
if (err) callback(err);
console.log( JSON.stringify( results, undefined, 2 ) );
callback();
});
});
}
],
function(err) {
if (err) throw err;
mongoose.disconnect();
}
);
And that will produce output like:
这将产生如下输出:
[
{
"_id": "55e6aa0e85e8b9102179f5c2",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff5",
"name": "dom1.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c2",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "James",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c4",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Jess",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c5",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Chris",
"__v": 0
}
]
},
"name": "James",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c3",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff6",
"name": "dom2.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c3",
"domain": "55e6aa0ecb536c5a93574ff6",
"name": "Phil",
"__v": 0
}
]
},
"name": "Phil",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c4",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff5",
"name": "dom1.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c2",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "James",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c4",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Jess",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c5",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Chris",
"__v": 0
}
]
},
"name": "Jess",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c5",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff5",
"name": "dom1.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c2",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "James",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c4",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Jess",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c5",
"domain": "55e6aa0ecb536c5a93574ff5",
"name": "Chris",
"__v": 0
}
]
},
"name": "Chris",
"__v": 0
},
{
"_id": "55e6aa0e85e8b9102179f5c6",
"domain": {
"_id": "55e6aa0ecb536c5a93574ff7",
"name": "dom3.com",
"__v": 0,
"users": [
{
"_id": "55e6aa0e85e8b9102179f5c6",
"domain": "55e6aa0ecb536c5a93574ff7",
"name": "Jesse",
"__v": 0
}
]
},
"name": "Jesse",
"__v": 0
}
]
So all domains either got created or were re-used when then existed, and we added the user to the list in there at the same time using $push
since we already had the _id
for the user once the instance was created.
因此,所有域都已创建或在存在时重新使用,并且我们使用$ push同时将用户添加到列表中,因为在创建实例后我们已经为用户提供了_id。
With the domain document returned, being either new or something found, you simply set the domain on the user and save it.
返回域文档,无论是新文档还是找到的文档,您只需在用户上设置域并保存即可。
The async.eachLimit
is also a "special" variant that "limits" the number of concurrent processes to run under the loop. This is wise practice in real world scenarios, since you don't want to have every update happening at the same time.
async.eachLimit也是一个“特殊”变体,它“限制”在循环下运行的并发进程数。这是现实场景中的明智做法,因为您不希望同时发生每个更新。
Also, regardless of the process, the "Domain" cannot possibly get created more than once. The atomic operations of MongoDB will prevent this, and you only get an existing return or the new document depending on what was there at the time of the request.
此外,无论进程如何,“域”都不可能多次创建。 MongoDB的原子操作将阻止这种情况,您只能获得现有的返回或新文档,具体取决于请求时的内容。
As you can see in the output, everything can populate nicely so the "User" and "Domain" details are visible at all levels.
正如您在输出中看到的,所有内容都可以很好地填充,因此“用户”和“域”详细信息在所有级别都可见。
Moral of the story is "Don't tie yourself in knots persisting one thing and altering again and again and again". Just do it once and be done with it. It's certainly faster.
故事的道德是“不要把自己束缚在坚持一件事并且一次又一次地改变”。只需要做一次并完成它。它当然更快。
#2
1
§addToSet together with async does exactly what you're looking for:
§addToSet与async一起完成您正在寻找的内容:
var items = [
{dom: 'dom1.com', user: "Johnny"},
{dom: 'dom1.com', user: "Doggie"},
{dom: 'dom1.com', user: "Lisa"},
{dom: 'dom2.com', user: "Mark"},
{dom: 'dom3.com', user: "Denny"}
];
async.each(items, function(item, callback){
Domain.findOneAndUpdate(
{domain: item.dom},
{$addToSet: {users: item.user}},
{ "new": true, "upsert": true },
callback
);
},
function (err){
// done / handle errors
}
);
Output:
输出:
[{
"_id": ObjectID("55e6c9bf63006d730254ea8b"),
"domain": "dom1.com",
"users": [
"Johnny",
"Doggie",
"Lisa"
]
},
{
"_id": ObjectID("55e6c9bf63006d730254ea8c"),
"domain": "dom2.com",
"users": [
"Mark"
]
},
{
"_id": ObjectID("55e6c9bf63006d730254ea8d"),
"domain": "dom3.com",
"users": [
"Denny"
]
}]